| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| Member(s) browsing this thread: VulcanCat , StrayCat |
|
Thread Status: Active Total posts in this thread: 573
|
|
| Author |
|
|
Garrulus glandarius
Advanced Cruncher Romania Joined: Apr 10, 2025 Post Count: 89 Status: Offline Project Badges:
|
Just had a few dozen tasks validated, all from the current tests. Still got 400+ waiting for validation since August, but at least something is moving.
----------------------------------------![]() ![]() |
||
|
|
wildhagen
Veteran Cruncher The Netherlands Joined: Jun 5, 2009 Post Count: 1004 Status: Offline Project Badges:
|
Work seems to be flowing again. I get a lot of new workuniotes. Although these are still not "production" units, but testunits in the MCM1_9999995 range.
|
||
|
|
MJH333
Senior Cruncher England Joined: Apr 3, 2021 Post Count: 300 Status: Offline Project Badges:
|
I picked up 59 of these MCM1_9999995 test units between 09:43 and 10:20 UTC today (26 October).
As of now, 25 of these units have validated .In most cases, I and my wingman claimed and received about 40 credits. In a few cases, my wingman claimed 202.5 (a known issue!). In one case, my wingman claimed only 10: https://www.worldcommunitygrid.org/contribution/workunit/765899390. Cheers, Mark |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2501 Status: Offline Project Badges:
|
Well, the new batch must be over by now. I have had my computer turned off, and when turning it on again, I did not get any new tasks.
----------------------------------------Let's see if Dylan sends out any more batches. Validations of the previous test batches, seems to be ongoing though. So, that's progress.... Edit, added: I think the validation of previous test tasks stopped, well before all test task had been validated. [Edit 2 times, last edit by Grumpy Swede at Oct 26, 2025 2:49:55 PM] |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1296 Status: Offline Project Badges:
|
I'm seeing more test WUs validated. So I'm seeing progress.
----------------------------------------looks like a new test batch went out 9999995 . I haven't gotten any of theses edit to add: all my test WUs that have a second result have validated. That is the few pending I have are due to a lack of a second result [Edit 1 times, last edit by Unixchick at Oct 26, 2025 7:18:38 PM] |
||
|
|
dylanht
World Community Grid Tech Joined: Jul 1, 2021 Post Count: 35 Status: Offline Project Badges:
|
Either a few more test batches, and tentatively, in the best case, the first contiguous normal batches back in the "MCM1_024%" range coming down the pipe later today.
Redeployed the reducers, validators, and assimilators after fixing copius configuration issues/misunderstandings, bugs, and oversights that were causing the low validation throughput. A small victory, the "replay" (https://docs.redpanda.com/current/develop/data-transforms/deploy/#reprocess which describes --from-offset to effectively re-do a bunch of workunit processing by pointing into the past in the upload event queue) seemed to go well as far as our bare bones Grafana+Prometheus monitoring dashboard for Kafka could show. The transforms were able to keep up with the burst of re-reading a few million upload events and building up their reduced map of workunit names to uploaded result files that upon reaching quorum, are emitted as a new event for the validators to consume, and the key in the map is deleted in the transform. There was also an issue with the validators, where the payload buffer of the "successful validation" event emission was getting corrupted because I was wrongly emitting the event asynchronously and then trying to use the buffer again for the next pair of result files. There were other, even more embarassing bugs, but now the validator properly flushes the buffer and emits the message synchronously, while still churning through results quickly especially when it doesn't have to fallback to NFS to look for a result file. The assimilators seemed to build up their big credit dump and "state fast-tracking" transaction across the workunit, result, host, user, and team tables, and respect the check for canonical and validate_state for idempotence. I am going to try and start the transitioner today, which should start picking up necessary resends for the test batches. If that works and I don't see any evidence that resuming scheduled MCM1 batches will make another mess too big to clean up while live, I will resume the normal MCM1 range and schedule the batches to go out at about the clip the assimilators seemed capable of handling per hour in the replay "stress test". |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
Thanks for the update Dylan, specially on the weekend...
I hope with all these updates and changes, it is not forgotten to reflect all the work done in the Results page as well as producing external stats again. Also, just to pick up an almost stone age topic, as you mentioned replaying stuff from the past, could that possibly, eventually, magically also include all those WUs that were returned 2 1/2 years ago when WCG was restarted after the transition from IBM. There were a lot of promises (and outright lies) back then, so at least I would welcome an honest answer about this, even if that answer is that those roughly 4 months of work results are all lost by now... thanks, Ralf |
||
|
|
dylanht
World Community Grid Tech Joined: Jul 1, 2021 Post Count: 35 Status: Offline Project Badges:
|
Hi Ralf,
Not forgotten. Changes to the result state are committed by the new assimilator to the BOINC database, and that should be reflected in the results page which fetches that data from the database via the APIs. Once I re-schedule the stats export scripts which are currently disabled, the external stats at https://download.worldcommunitygrid.org/boinc/stats/ and the stats on the website under the community tab should start updating again. Regarding communications - often based on my reporting - that turned into outright falsehoods. There is no excuse for that. We always planned and tried to achieve what we said we would do, we just failed almost everytime for years. About that particular case, until we get to the end of the current issues I cannot give a firm answer. We were taking TSM backups of all NFS mounts from that period, and in theory that would include the BOINC database backups and file necessary to make good on that promise. Optimistically, there is a chance. First, I must get all of MCM1, MAM1 (beta30, then launch), and ARP1 up and running again. Once I do, I can try the restore from TSM, and get a firm answer. Best, Dylan |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
Hi Ralf, Thanks Dylan,Not forgotten. Changes to the result state are committed by the new assimilator to the BOINC database, and that should be reflected in the results page which fetches that data from the database via the APIs. Once I re-schedule the stats export scripts which are currently disabled, the external stats at https://download.worldcommunitygrid.org/boinc/stats/ and the stats on the website under the community tab should start updating again. Regarding communications - often based on my reporting - that turned into outright falsehoods. There is no excuse for that. We always planned and tried to achieve what we said we would do, we just failed almost everytime for years. About that particular case, until we get to the end of the current issues I cannot give a firm answer. We were taking TSM backups of all NFS mounts from that period, and in theory that would include the BOINC database backups and file necessary to make good on that promise. Optimistically, there is a chance. First, I must get all of MCM1, MAM1 (beta30, then launch), and ARP1 up and running again. Once I do, I can try the restore from TSM, and get a firm answer. Best, Dylan my critique was less towards you, as a former "communications" person that was outright lying and then got all mad about it when pointed out. Let's hope that's in the past. It's just with all the communication issues over the last 3+ years now, it is at times really frustrating when there are these constant prolonged outages, which seems always kind of self-inflicted. Let's see what happens next week, thanks again, Ralf |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1317 Status: Offline Project Badges:
|
Dylan,
----------------------------------------Thanks for the technical details! I have a question relating to all the workunits that seemed to have been stuck awaiting assimilation since the over-full disk issue and all those stuck Pending Validation because of the migration downtime. Will clearing those out need use of the old validator and assimilator, or have you got some means of feeding them through your new set-up? Cheers - Al. P.S. An attempt to preview or post a slightly longer version of this from my usual "postings" system got "403 - Forbidden" messages... [Edit 1 times, last edit by alanb1951 at Oct 27, 2025 2:24:08 AM] |
||
|
|
|