Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
Member(s) browsing this thread: Hans Sveen , alanb1951
Thread Status: Active
Total posts in this thread: 618
Posts: 618   Pages: 62   [ Previous Page | 53 54 55 56 57 58 59 60 61 62 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 52018 times and has 617 replies Next Thread
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 448
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

"httpd logs filled up the disk, tried restarting the feeder only but
websphere also ended up in a bad state, archived and rotated the logs and
restarted the websphere server and apps, if all goes well back up in a few
minutes here"
I wonder if there has been any thought of automating those tasks on a daily basis or other appropriate time frame.

But a large THANKS for getting the system back online!
[Dec 8, 2025 9:48:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Paul Schlaffer
Senior Cruncher
USA
Joined: Jun 12, 2005
Post Count: 279
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)


WCG are probably well aware of many of these variants, as they can periodically sift the hosts table looking at the os_name and os_version fields for items that don't match up to certain expected patterns[*1] -- unfortunately, the BOINC scheduler seems to trust what it is told so it doesn't reject stuff that doesn't actually make sense.


That is the part I don't fully understand. The event log at startup is showing the correct OS (a Windows variant) AND any available WSL2 OS installed. Therefore, WCG should know what the correct (main) OS is, and they should also know any WSL installs which are available to them, should they choose to utilize that feature. It would seem the issue is with how WCG is applying the information that the BOINC client is providing.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
----------------------------------------
[Edit 1 times, last edit by Paul Schlaffer at Dec 8, 2025 9:51:48 PM]
[Dec 8, 2025 9:50:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1328
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

It would seem the issue is with how WCG is applying the information that the BOINC client is providing.
Actually, WCG is handling what the client tells it in standard BOINC fashion (which assumes the host doesn't report a non-Windows O/S whilst asking for a Windows binary!), and in these cases the host is mis-reporting the O/S!

Unfortunately, the only ways to prove this are by being able to see what's in a scheduler request from one of these hosts or doing a code dive, though some of the examples I tried to include in my previous post would illustrate the issue (O/S information looking like Linux error reports!)

For an overview, I posted some hints as to what's happening on December 1st (including naming some BOINC source code modules of interest...). In the following discussions, TLD provided samples of various scheduler requests based on combinations on a WSL2 host -- those had correct information in each case.

The client needs to query the host O/S to get various details it includes in scheduler requests -- in some cases, a scheduler request is going in with platform details for Windows (proven by stderr reports for alleged Alpine Linux clients showing the MCM1 Windows executable!) but O/S details from whatever is associated with WSL (which might be an error message in some cases, often for tasks that fail in one way or another)...[*1]

Bottom line? It's not WCG's fault that standard BOINC trusts what a scheduler request tells it, or that standard BOINC doesn't check the application platform against the host platform to see it there's a major discrepancy. WCG (or BOINC maintainersl) will need to fix this if the client issues [with old wsl and/or Docker?] can't be resolved.

Cheers - Al

P.S. I tried to find some past posts that may have had suitable samples, but I couldn 't remember what thread(s) they were in and the Search function only seems to be able to find posts that were made after the brief outage and restart yesterday afteroon

*1 I don't think there's much point in "showing my work" on that topic; I have lots of examples, Forbidden and otherwise smile
----------------------------------------
[Edit 4 times, last edit by alanb1951 at Dec 9, 2025 8:05:48 AM]
[Dec 9, 2025 12:22:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1328
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Something completely different...

I was trying to track down some older discussions regarding the bad O/S data issue when compiling my previous post, and the sort function wasn't finding anything. I've since determined that it looks as if any posts from before the 16:45 to 18:00 (UTC) service break yesterday are not currently indexed... Anyone else seeing the same?

I wonder whether there's a function to rebuild the indexes I presume the search function needs, and [if so] whether it has been set off.. I suspect it is going to take a long time to restore the missing stuff if it hasn't got some sort of recovery mechanism for that situation.

Cheers - Al.
[Dec 9, 2025 8:18:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1412
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Unfortunately, the only ways to prove this are by being able to see what's in a scheduler request from one of these hosts or doing a code dive, though some of the examples I tried to include in my previous post would illustrate the issue (O/S information looking like Linux error reports!)

I tested this with WSL enabled and disabled and will display here what are the relevant parts in scheduler_request:

WSL disabled:
....
<platform_name>windows_x86_64</platform_name>
<alt_platform>
<name>windows_intelx86</name>
</alt_platform>
....
<host_info>
....
<wsl>
<distro>
<distro_name>Ubuntu</distro_name>
<os_name>W</os_name>
<os_version>W</os_version>
<wsl_version>2</wsl_version>
<is_default/>
</distro>
</wsl>



WSL enabled:
....
<platform_name>windows_x86_64</platform_name>
<alt_platform>
<name>windows_intelx86</name>
</alt_platform>
....
<host_info>
....
<wsl>
<distro>
<distro_name>Ubuntu</distro_name>
<os_name>Ubuntu</os_name>
<os_version>Ubuntu 24.04.3 LTS</os_version>
<wsl_version>2</wsl_version>
<is_default/>
<libc_version>2.39</libc_version>
<docker_version>4.9.3</docker_version>
<docker_type>2</docker_type>
</distro>
</wsl>

[Dec 9, 2025 12:17:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 448
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

2025-11-12 23:59:48 UTC; 2025-11-13 02:24:20 UTC; 2025-11-13 04:29:58 UTC; 2025-11-13 14:08:02 UTC; 2025-11-13 16:45:52 UTC; 2025-11-14 18:56:02 UTC; 2025-11-16 02:33:09 UTC; 2025-11-16 16:50:41 UTC; 2025-11-18 05:12:14 UTC; 2025-11-18 17:31:49 UTC; 2025-11-20 05:27:00 UTC; 2025-11-21 06:08:47 UTC; *Account Last updated: Nov. 23, 2025 - 00:06 UTC; 2025-11-25 07:32:58 UTC; Last updated: Nov. 26, 2025 - 12:06 UTC; 2025-11-28 03:22:28 UTC; 2025-12-04 17:55:59 UTC; 2025-12-09 17:49:07 UTC


The following are at current time :

In progress - 229 items; 240 items; 260 items; 253 items; 242 items; 240 items; 241 items; 245 items; 241 items; 104 items; 312 items; 310 items; *21 items; 6 items; 0 items; 248 items; 249 items; 258 items

Pending Validation - 13801 items; 13853 items; 13952 items; 14319 items; 14394 items; 14897 items; 15371 items; 15820 items; 16503 items; 16614 items; 17259 items; 18139 items; *19105 items; 21846 items [oldest 2025-08-21 09:36:32 UTC]; 21579 items [most recent sent time 2025-11-24 19:30:42 UTC]; 19792 items; 21555 items; 24651 items

Valid - 47260 items; 47272 items; 47306 items; 47592 items; 47702 items; 48813 items; 50384 items; 50850 items; 52512 items; 53192 items; 54597 items; 56066 items; *58811 items; 59067 items [oldest 2025-08-08]; 59345 items; 63224 items;
72886 items; 81500 items

Oldest Pending WU - https://www.worldcommunitygrid.org/contribution/workunit/759200440

* - indicates placekeeper data - during "Server error feeder not running" issues over several hours on 11/23

[Dec 9, 2025 6:12:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 1014
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Hi!
The time ( or the number of wus for the latest 24 hours) for the latest update on first page on Project Status is not updatting!?

https://stuxnode.com/images/wcg.jpg

Hans S.
----------------------------------------
[Edit 3 times, last edit by Hans Sveen at Dec 9, 2025 8:41:33 PM]
[Dec 9, 2025 8:24:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1328
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thanks to Crystal Pellet for that example, which gives a massive clue as to why the O/S information is showing up incorrectly - the "W" case nails it...

Looking back to the examples given by TLD it can be seen that the os_name and os_version tags appear twice, the second occurrence being inside the wsl block of XML, So perhaps the parser in the server is being lazy and isn't skipping over the wsl block if it is of no interest, the end result being that the second values overwrite the original host_info ones...

I don't know whether there is some BUDA-specific logic that is supposed to deal with that properly; if there is, it is either not in the server version WCG are running or it isn't working as expected sad

Given that at some point the content of whatever is presented in the scheduler requests as information relevant to BUDA-style tasks will become important, it seems unlikely that a simple "what is the platform name" hack would be a good long-term solution -- unfortunate, that, but it was only a thought!

I might try to find time for another code-dive to see if I can find anything relevant, but I suspect WCG Tech Team already know more about this than I'd be able to deduce...

By the way, I believe the server uses the same XML parser(s) as the client, and the versions I've looked into effectively run through the XML as a flat file rather than something structured (so "lazy" behaviour is possible!) I had looked at this in April 2024 whilst trying to work out why my client log was getting spammed with configuration details on every server request -- I now have a version with an inelegant hack that works round that issue (which was caused by failure to finish parsing the scheduler reply if it found a matching venue tag, causing problems with newer server versions of the scheduler reply that put the modification time after the venues...)

Cheers - Al.

P.S. It's no fun trying to find old posts when the search still seems to only know about messages after the recent web site restart...
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Dec 9, 2025 10:21:07 PM]
[Dec 9, 2025 10:06:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2534
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

The validation rate for tasks crunched before, during (cached tasks), as well as tasks crunched after the migration, is now dropping again. For me I can see that the number of "Pending validation" (crunched by both wingmen), is rising like a rocket.

So the result is of course, that my RAC (Recent Average Credit) is dropping like a stone.

Global statistics history:
[Statistics date	Total run time (y:d:h:m:s)	Points generated	Results returned
12/09/2025 97:347:12:26:03 275,079,998 461,822
12/08/2025 167:324:17:13:44 475,972,547 802,849
12/07/2025 202:086:01:36:11 570,838,318 962,729
12/06/2025 194:342:11:29:37 547,531,669 926,269
12/05/2025 192:023:13:57:44 536,398,856 908,430
12/04/2025 671:144:13:47:42 1,841,668,593 3,134,034

----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Dec 10, 2025 2:35:57 AM]
[Dec 10, 2025 2:34:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1307
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

My validation rate has slowed too. I was thinking that it would speed up after that last official announcement.
[Dec 10, 2025 4:17:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 618   Pages: 62   [ Previous Page | 53 54 55 56 57 58 59 60 61 62 | Next Page ]
[ Jump to Last Post ]
Post new Thread