| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| Member(s) browsing this thread: Hans Sveen , alanb1951 |
|
Thread Status: Active Total posts in this thread: 618
|
|
| Author |
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 448 Status: Offline Project Badges:
|
"httpd logs filled up the disk, tried restarting the feeder only but I wonder if there has been any thought of automating those tasks on a daily basis or other appropriate time frame.websphere also ended up in a bad state, archived and rotated the logs and restarted the websphere server and apps, if all goes well back up in a few minutes here" But a large THANKS for getting the system back online! |
||
|
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 279 Status: Recently Active Project Badges:
|
WCG are probably well aware of many of these variants, as they can periodically sift the hosts table looking at the os_name and os_version fields for items that don't match up to certain expected patterns[*1] -- unfortunately, the BOINC scheduler seems to trust what it is told so it doesn't reject stuff that doesn't actually make sense. That is the part I don't fully understand. The event log at startup is showing the correct OS (a Windows variant) AND any available WSL2 OS installed. Therefore, WCG should know what the correct (main) OS is, and they should also know any WSL installs which are available to them, should they choose to utilize that feature. It would seem the issue is with how WCG is applying the information that the BOINC client is providing.
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
----------------------------------------[Edit 1 times, last edit by Paul Schlaffer at Dec 8, 2025 9:51:48 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1328 Status: Recently Active Project Badges:
|
It would seem the issue is with how WCG is applying the information that the BOINC client is providing. Actually, WCG is handling what the client tells it in standard BOINC fashion (which assumes the host doesn't report a non-Windows O/S whilst asking for a Windows binary!), and in these cases the host is mis-reporting the O/S!Unfortunately, the only ways to prove this are by being able to see what's in a scheduler request from one of these hosts or doing a code dive, though some of the examples I tried to include in my previous post would illustrate the issue (O/S information looking like Linux error reports!) For an overview, I posted some hints as to what's happening on December 1st (including naming some BOINC source code modules of interest...). In the following discussions, TLD provided samples of various scheduler requests based on combinations on a WSL2 host -- those had correct information in each case. The client needs to query the host O/S to get various details it includes in scheduler requests -- in some cases, a scheduler request is going in with platform details for Windows (proven by stderr reports for alleged Alpine Linux clients showing the MCM1 Windows executable!) but O/S details from whatever is associated with WSL (which might be an error message in some cases, often for tasks that fail in one way or another)...[*1] Bottom line? It's not WCG's fault that standard BOINC trusts what a scheduler request tells it, or that standard BOINC doesn't check the application platform against the host platform to see it there's a major discrepancy. WCG (or BOINC maintainersl) will need to fix this if the client issues [with old wsl and/or Docker?] can't be resolved. Cheers - Al P.S. I tried to find some past posts that may have had suitable samples, but I couldn 't remember what thread(s) they were in and the Search function only seems to be able to find posts that were made after the brief outage and restart yesterday afteroon *1 I don't think there's much point in "showing my work" on that topic; I have lots of examples, Forbidden and otherwise ![]() [Edit 4 times, last edit by alanb1951 at Dec 9, 2025 8:05:48 AM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1328 Status: Recently Active Project Badges:
|
Something completely different...
I was trying to track down some older discussions regarding the bad O/S data issue when compiling my previous post, and the sort function wasn't finding anything. I've since determined that it looks as if any posts from before the 16:45 to 18:00 (UTC) service break yesterday are not currently indexed... Anyone else seeing the same? I wonder whether there's a function to rebuild the indexes I presume the search function needs, and [if so] whether it has been set off.. I suspect it is going to take a long time to restore the missing stuff if it hasn't got some sort of recovery mechanism for that situation. Cheers - Al. |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1412 Status: Offline Project Badges:
|
Unfortunately, the only ways to prove this are by being able to see what's in a scheduler request from one of these hosts or doing a code dive, though some of the examples I tried to include in my previous post would illustrate the issue (O/S information looking like Linux error reports!) I tested this with WSL enabled and disabled and will display here what are the relevant parts in scheduler_request: WSL disabled: .... <platform_name>windows_x86_64</platform_name> <alt_platform> <name>windows_intelx86</name> </alt_platform> .... <host_info> .... <wsl> <distro> <distro_name>Ubuntu</distro_name> <os_name>W</os_name> <os_version>W</os_version> <wsl_version>2</wsl_version> <is_default/> </distro> </wsl> WSL enabled: .... <platform_name>windows_x86_64</platform_name> <alt_platform> <name>windows_intelx86</name> </alt_platform> .... <host_info> .... <wsl> <distro> <distro_name>Ubuntu</distro_name> <os_name>Ubuntu</os_name> <os_version>Ubuntu 24.04.3 LTS</os_version> <wsl_version>2</wsl_version> <is_default/> <libc_version>2.39</libc_version> <docker_version>4.9.3</docker_version> <docker_type>2</docker_type> </distro> </wsl> |
||
|
|
bfmorse
Senior Cruncher US Joined: Jul 26, 2009 Post Count: 448 Status: Offline Project Badges:
|
2025-11-12 23:59:48 UTC; 2025-11-13 02:24:20 UTC; 2025-11-13 04:29:58 UTC; 2025-11-13 14:08:02 UTC; 2025-11-13 16:45:52 UTC; 2025-11-14 18:56:02 UTC; 2025-11-16 02:33:09 UTC; 2025-11-16 16:50:41 UTC; 2025-11-18 05:12:14 UTC; 2025-11-18 17:31:49 UTC; 2025-11-20 05:27:00 UTC; 2025-11-21 06:08:47 UTC; *Account Last updated: Nov. 23, 2025 - 00:06 UTC; 2025-11-25 07:32:58 UTC; Last updated: Nov. 26, 2025 - 12:06 UTC; 2025-11-28 03:22:28 UTC; 2025-12-04 17:55:59 UTC; 2025-12-09 17:49:07 UTC The following are at current time : In progress - 229 items; 240 items; 260 items; 253 items; 242 items; 240 items; 241 items; 245 items; 241 items; 104 items; 312 items; 310 items; *21 items; 6 items; 0 items; 248 items; 249 items; 258 items Pending Validation - 13801 items; 13853 items; 13952 items; 14319 items; 14394 items; 14897 items; 15371 items; 15820 items; 16503 items; 16614 items; 17259 items; 18139 items; *19105 items; 21846 items [oldest 2025-08-21 09:36:32 UTC]; 21579 items [most recent sent time 2025-11-24 19:30:42 UTC]; 19792 items; 21555 items; 24651 items Valid - 47260 items; 47272 items; 47306 items; 47592 items; 47702 items; 48813 items; 50384 items; 50850 items; 52512 items; 53192 items; 54597 items; 56066 items; *58811 items; 59067 items [oldest 2025-08-08]; 59345 items; 63224 items; 72886 items; 81500 items Oldest Pending WU - https://www.worldcommunitygrid.org/contribution/workunit/759200440 * - indicates placekeeper data - during "Server error feeder not running" issues over several hours on 11/23 |
||
|
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 1014 Status: Recently Active Project Badges:
|
Hi!
----------------------------------------The time ( or the number of wus for the latest 24 hours) for the latest update on first page on Project Status is not updatting!? https://stuxnode.com/images/wcg.jpg Hans S. [Edit 3 times, last edit by Hans Sveen at Dec 9, 2025 8:41:33 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1328 Status: Recently Active Project Badges:
|
Thanks to Crystal Pellet for that example, which gives a massive clue as to why the O/S information is showing up incorrectly - the "W" case nails it...
----------------------------------------Looking back to the examples given by TLD it can be seen that the os_name and os_version tags appear twice, the second occurrence being inside the wsl block of XML, So perhaps the parser in the server is being lazy and isn't skipping over the wsl block if it is of no interest, the end result being that the second values overwrite the original host_info ones... I don't know whether there is some BUDA-specific logic that is supposed to deal with that properly; if there is, it is either not in the server version WCG are running or it isn't working as expected ![]() Given that at some point the content of whatever is presented in the scheduler requests as information relevant to BUDA-style tasks will become important, it seems unlikely that a simple "what is the platform name" hack would be a good long-term solution -- unfortunate, that, but it was only a thought! I might try to find time for another code-dive to see if I can find anything relevant, but I suspect WCG Tech Team already know more about this than I'd be able to deduce... By the way, I believe the server uses the same XML parser(s) as the client, and the versions I've looked into effectively run through the XML as a flat file rather than something structured (so "lazy" behaviour is possible!) I had looked at this in April 2024 whilst trying to work out why my client log was getting spammed with configuration details on every server request -- I now have a version with an inelegant hack that works round that issue (which was caused by failure to finish parsing the scheduler reply if it found a matching venue tag, causing problems with newer server versions of the scheduler reply that put the modification time after the venues...) Cheers - Al. P.S. It's no fun trying to find old posts when the search still seems to only know about messages after the recent web site restart... [Edit 1 times, last edit by alanb1951 at Dec 9, 2025 10:21:07 PM] |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2534 Status: Offline Project Badges:
|
The validation rate for tasks crunched before, during (cached tasks), as well as tasks crunched after the migration, is now dropping again. For me I can see that the number of "Pending validation" (crunched by both wingmen), is rising like a rocket.
----------------------------------------So the result is of course, that my RAC (Recent Average Credit) is dropping like a stone. Global statistics history: [Statistics date Total run time (y:d:h:m:s) Points generated Results returned [Edit 1 times, last edit by Grumpy Swede at Dec 10, 2025 2:35:57 AM] |
||
|
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1307 Status: Offline Project Badges:
|
My validation rate has slowed too. I was thinking that it would speed up after that last official announcement.
|
||
|
|
|