| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| Member(s) browsing this thread: fredoli , uncaughtraccoondog |
|
Thread Status: Active Total posts in this thread: 600
|
|
| Author |
|
|
jos12345
Cruncher Belgium Joined: Aug 21, 2025 Post Count: 3 Status: Offline Project Badges:
|
Hey Dylan, many thanks and respect for all the work you put into it.
----------------------------------------![]() |
||
|
|
dylanht
World Community Grid Tech Joined: Jul 1, 2021 Post Count: 35 Status: Offline Project Badges:
|
Hi Al,
----------------------------------------Looking at the apache logs I see just over 1000 such cases from the following 10 batches of which 9/10 were created with the new system, all parsed from 404 log lines since roughly nov 3rd around 16:40 UTC and there's probably more in the rotated/archived logs: 0241027 0241652 0241721 0241727 0241781 0241784 0241791 0241843 0241896 0241921 Best guess as to why this is happening is some requests are being redispatched to the wrong backend if the initial HTTP request that should be routed to the right server/bucket fails. Checking a few of the specific files, that bucket was not in the contiguous range of the node's "partition". As I wait for my new indexes to finish creating in the BOINC database so I can restart with an updated server config as well, I'll review the HAProxy config and see if it is as simple as removing the redispatch directive. If there is another way the request is falling through to round-robin and skipping my use-server directive, that might take a bit more research to make sure I don't mess up what is working. Part of the reason reconciling the workunits that were uploaded right after we restarted was an off-by-one error in my routing policy that caused nearly every POST to miss the use-server directive that is supposed to direct it to the correct partition, and instead of going to the partition corresponding to the contiguous 1/6th of the 1024 fanout buckets I have been rambling about in updates it just went willy nilly everywhere, which I discovered later. For uploads, I did create all 1024 buckets to accept from any server in this case, but not for downloads although the result would still have been a 404 if my theory holds. The 403s might be lock contention in the DB2 instance that holds the website and forum database, but I'm not so sure about that theory yet. I started looking into it while hosting was fixing the Ceph issue, but I didn't get far in the exploratory phase. [Edit 2 times, last edit by dylanht at Nov 7, 2025 9:23:14 PM] |
||
|
|
dylanht
World Community Grid Tech Joined: Jul 1, 2021 Post Count: 35 Status: Offline Project Badges:
|
Update on the BOINC database maintenance, the queries to define a handful of new indices that I hope will ameliorate several of the ongoing issues including the lock contention causing crashes completed after taking much longer than I thought they would. It will be a bit longer before I can reactivate the feeder, I am trying to push config changes that should help improve performance of the BOINC database. Volunteers may experience oddities with any API that talks to the BOINC database over the next few hours, I am hopeful I can bring the feeder back online in that timeframe.
|
||
|
|
RoKKor
Cruncher Joined: Oct 10, 2016 Post Count: 8 Status: Offline Project Badges:
|
Aaah! Thanks for the update!
That's why I have exhausted all WCG WU's in my machines. Most of my machines still have quite a que of WU's that are finished and ready to upload but just sitting there. Well... while this is going on, I think I'll just build another Boinc machine with recycled hardware. Got nothing better to do over the weekend. lol |
||
|
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 396 Status: Offline Project Badges:
|
Thanks for the updates dylanht!
----------------------------------------My machines have been switched to Folding@home. I will switch them back after my returned work is fully processed properly. Too much instability right now ...
[Edit 7 times, last edit by AgrFan at Nov 8, 2025 4:43:12 AM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1326 Status: Offline Project Badges:
|
Dylan,
Thanks for the detailed responses on progress. I dread to think how many items needed to be incorporated in each new index -- not really surprised to know it has been taking a while! It will be interesting to see what sort of long-term performance effects that might have on BOINC d/b access -- index maintenance and usage versus full table scans? :-) Good luck with any fixes... Cheers - Al. P.S. I had to back out of the forums and go back in again to get this to post. SIde-effect of proxy fixes? |
||
|
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 997 Status: Recently Active Project Badges:
|
Something is moving ; had couple of wus that has finished but not reported, now they are gone 😉
Looking forward to next move! Hans S. |
||
|
|
amsanity
Cruncher Joined: Nov 6, 2025 Post Count: 3 Status: Offline |
Thanks for the update. Hopefully I can put the machine to use shortly.
|
||
|
|
RoKKor
Cruncher Joined: Oct 10, 2016 Post Count: 8 Status: Offline Project Badges:
|
All my WCG WU's that were done and waiting for upload are gone. But I didn't get any new ones yet.
|
||
|
|
IT022906
Cruncher Joined: Feb 4, 2005 Post Count: 26 Status: Offline Project Badges:
|
same for me
|
||
|
|
|