Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
Member(s) browsing this thread: fredoli , uncaughtraccoondog
Thread Status: Active
Total posts in this thread: 600
Posts: 600   Pages: 60   [ Previous Page | 33 34 35 36 37 38 39 40 41 42 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 46544 times and has 599 replies Next Thread
jos12345
Cruncher
Belgium
Joined: Aug 21, 2025
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Hey Dylan, many thanks and respect for all the work you put into it.
----------------------------------------

[Nov 7, 2025 9:02:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dylanht
World Community Grid Tech
Joined: Jul 1, 2021
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Hi Al,

Looking at the apache logs I see just over 1000 such cases from the following 10 batches of which 9/10 were created with the new system, all parsed from 404 log lines since roughly nov 3rd around 16:40 UTC and there's probably more in the rotated/archived logs:
0241027 0241652 0241721 0241727 0241781 0241784 0241791 0241843 0241896 0241921

Best guess as to why this is happening is some requests are being redispatched to the wrong backend if the initial HTTP request that should be routed to the right server/bucket fails. Checking a few of the specific files, that bucket was not in the contiguous range of the node's "partition". As I wait for my new indexes to finish creating in the BOINC database so I can restart with an updated server config as well, I'll review the HAProxy config and see if it is as simple as removing the redispatch directive. If there is another way the request is falling through to round-robin and skipping my use-server directive, that might take a bit more research to make sure I don't mess up what is working.

Part of the reason reconciling the workunits that were uploaded right after we restarted was an off-by-one error in my routing policy that caused nearly every POST to miss the use-server directive that is supposed to direct it to the correct partition, and instead of going to the partition corresponding to the contiguous 1/6th of the 1024 fanout buckets I have been rambling about in updates it just went willy nilly everywhere, which I discovered later. For uploads, I did create all 1024 buckets to accept from any server in this case, but not for downloads although the result would still have been a 404 if my theory holds.

The 403s might be lock contention in the DB2 instance that holds the website and forum database, but I'm not so sure about that theory yet. I started looking into it while hosting was fixing the Ceph issue, but I didn't get far in the exploratory phase.
----------------------------------------
[Edit 2 times, last edit by dylanht at Nov 7, 2025 9:23:14 PM]
[Nov 7, 2025 9:20:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dylanht
World Community Grid Tech
Joined: Jul 1, 2021
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Update on the BOINC database maintenance, the queries to define a handful of new indices that I hope will ameliorate several of the ongoing issues including the lock contention causing crashes completed after taking much longer than I thought they would. It will be a bit longer before I can reactivate the feeder, I am trying to push config changes that should help improve performance of the BOINC database. Volunteers may experience oddities with any API that talks to the BOINC database over the next few hours, I am hopeful I can bring the feeder back online in that timeframe.
[Nov 8, 2025 1:33:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
RoKKor
Cruncher
Joined: Oct 10, 2016
Post Count: 8
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Aaah! Thanks for the update!

That's why I have exhausted all WCG WU's in my machines. Most of my machines still have quite a que of WU's that are finished and ready to upload but just sitting there.
Well... while this is going on, I think I'll just build another Boinc machine with recycled hardware. Got nothing better to do over the weekend. lol
[Nov 8, 2025 2:06:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 396
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks for the updates dylanht!

My machines have been switched to Folding@home.
I will switch them back after my returned work is fully processed properly.

Too much instability right now ...
----------------------------------------

  • i5-10400 (Comet Lake, 6C/12T) @ 2.9 GHz
  • i5-7400 (Kaby Lake, 4C/4T) @ 3.0 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3330 (Ivy Bridge, 4C/4T) @ 3.0 GHz

----------------------------------------
[Edit 7 times, last edit by AgrFan at Nov 8, 2025 4:43:12 AM]
[Nov 8, 2025 4:14:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1326
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Dylan,

Thanks for the detailed responses on progress. I dread to think how many items needed to be incorporated in each new index -- not really surprised to know it has been taking a while! It will be interesting to see what sort of long-term performance effects that might have on BOINC d/b access -- index maintenance and usage versus full table scans? :-)

Good luck with any fixes...

Cheers - Al.

P.S. I had to back out of the forums and go back in again to get this to post. SIde-effect of proxy fixes?
[Nov 8, 2025 5:40:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 997
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Something is moving ; had couple of wus that has finished but not reported, now they are gone 😉

Looking forward to next move!

Hans S.
[Nov 8, 2025 4:19:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
amsanity
Cruncher
Joined: Nov 6, 2025
Post Count: 3
Status: Offline
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks for the update. Hopefully I can put the machine to use shortly.
[Nov 8, 2025 4:27:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
RoKKor
Cruncher
Joined: Oct 10, 2016
Post Count: 8
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

All my WCG WU's that were done and waiting for upload are gone. But I didn't get any new ones yet.
[Nov 8, 2025 5:22:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
IT022906
Cruncher
Joined: Feb 4, 2005
Post Count: 26
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

same for me
[Nov 9, 2025 4:24:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 600   Pages: 60   [ Previous Page | 33 34 35 36 37 38 39 40 41 42 | Next Page ]
[ Jump to Last Post ]
Post new Thread