| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 567
|
|
| Author |
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2492 Status: Offline Project Badges:
|
Well, I got 120 tasks for one already running computer, but when I started another computer it's just "no tasks are available".
----------------------------------------Either the issue came back almost immediately, or every empty computer is trying to get new work at the same time. [Edit 1 times, last edit by Grumpy Swede at Dec 1, 2025 9:49:10 PM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
My last "other platforms" message was at 20:34:38 UTC today, and the first successful request for new work was at 20:36:18 UTC. Each system loaded up its full profile-defined quota in short order, so any subsequent requests made before a result was returned would just get told "no tasks" because I was at quota!
----------------------------------------Unfortunately, by the time I returned my first few completed tasks about an hour later it was back to "other platforms" again (thanks, Grumpy Swede, for the heads up about that - glad I check before posting!); I can't offer an estimate of how much earlier the problem returned... I wonder whether this issue relates to an increase in the number of Windows users who have a WSL/WSL2 instance (possibly courtesy of Berkeley's message about Docker?) that ends up mis-reporting its O/S as a Linux flavour when it is actually asking for or reporting Windows executables -- it that happens and something needs a retry the standard BOINC server code checks will stop anyone from being able to access that WU unless all existing results have the same problem![*1], though I don't know how many such items would need to be in the send cache to cause issues for most/all users. I'll be interested in the eventual fix report(s) for this issue when it gets resolved. It is probably why the issues of truncated O/S names and hr_class issues were grouped together in the first bullet point of the 2025-11-21 Operational Status update -- finding out what the actual platform for a mis-attributed O/S in a returned result should be (when trying to send another one) looks challenging, and way beyond my current code-dive level(!), so I wish Tech Team the very best of luck doing anything more than catching it at result report time to stop the bad O/S information getting into the database in the first place![*2]Cheers - Al. [Edited to clarify that I can't give a reliable time for when the problem returned...] *1 If any user wants to check this, a code dive into sched_send.cpp, sched_hr.cpp and hr.cpp should satisfy curiosity ![]() *2 As for why some versions of the BOINC client report an O/S inconsistent with the actual platform, software has bugs ...[Edit 1 times, last edit by alanb1951 at Dec 1, 2025 10:33:29 PM] |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2492 Status: Offline Project Badges:
|
@alanb1951
Very interesting analysis of the possible cause of this issue. I think you are on to something there. I sent a link to your post to Igor, who probably will give it to Dylan. |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
@alanb1951 If so, I do hope Dylan doesn't think that I'm automatically assuming he didn't know that already Very interesting analysis of the possible cause of this issue. I think you are on to something there. I sent a link to your post to Igor, who probably will give it to Dylan. -- I only posted the speculation because I get restless when there's no obvious explanation out there!It's the same when I sometimes post some fairly detailed status analysis because I'm frustrated by the lack of anything faintly resembling server status data (albeit that's a problem inherited from IBM...) Cheers - Al. P.S. I used to dislike "armchair DBAs/SysAdmins" when I was working -- I try to not be one myself, but... [Edit 1 times, last edit by alanb1951 at Dec 1, 2025 11:16:26 PM] |
||
|
|
Grumpy Swede
Master Cruncher Svíþjóð Joined: Apr 10, 2020 Post Count: 2492 Status: Offline Project Badges:
|
I'm sure Alan, that Dylan will not take it the wrong way.
New tasks are coming in here now. Since I only run this project, I'd better make sure that I have work for at least 2 days in the cache. |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
New tasks are coming in here now. Since I only run this project, I'd better make sure that I have work for at least 2 days in the cache. Yes -- that latest "other projects" seems to have been from some time between 20:50 and 21:38 UTC to some time between 23:15 and 23:20.Interestingly, everything I received up to 20:50 was new work, then everything that I received between 23:22 and 23:26 was a retry; after that (23:29 and onwards) everything is new work again. Wondering about that wrong O/S thing I had a look at a few of the retries I'm dealing with, but at a first glance it looks as if the retries were for overdue Darwin tasks. (I might come back to that later once I can see a full list after my next wingmen script runs.) Interestingly, some of the retries were O/S type "Linux" and O/S version "Docker Desktop" -- it'll be interesting to see what happens to those at validation time... Cheers - Al. |
||
|
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 278 Status: Offline Project Badges:
|
If WSL is causing a problem it's important to know. I'm one of those that just enabled it when updating to 8.2.8. I do see in the task detail that the OS is incorrectly reported. Upon startup the correct OS shows in the Event Log along with the available WSL Distro. Therefore, you'd think this wouldn't be an issue.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
|
||
|
|
Hans Sveen
Veteran Cruncher Norge Joined: Feb 18, 2008 Post Count: 981 Status: Offline Project Badges:
|
My guess is it has "something" to do with WSL,
----------------------------------------take a look at this workunit: https://www.worldcommunitygrid.org/contributi...071_9340,2,-Result%20name [Edit 2 times, last edit by Hans Sveen at Dec 2, 2025 8:46:25 AM] |
||
|
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1316 Status: Offline Project Badges:
|
I've seen some of those bad O/S type fields (as in Hans Sveen's example above) when a pair of Windows hosts failed to download and Linux got a look in for the retries! That bad O/S type was always associated with client 8.2.4, for what that might be worth. (And I had to modify my collection script because that field was actually multi-line and blew up my scanner logic!)
----------------------------------------Anyway, I've just had a quick skim through recent workunits for which I've sent in a result or had a download error, and I can see quite a few viable WSL2 examples as wingmen amongst my validated WUs. For instance my wingman for MCM1_0243131_2048 (WU ID 783060213) reports as Linux Ubuntu with a microsoft-standard-WSL2 kernel version and it has run a Linux executable as one would hope! And that's a typical example. I've also seen a fair number of Alpine Linux wingmen; in some cases where my task failed to download I've even seen a pair of these validate against one another, both having run a Windows executable (because they reported a Windows platform, whatever the O/S information might have said at the time I sampled the associated result data!) -- for instance, MCM1_0242404_6801 (WU ID 776714012). And, as I mentioned in an earlier post, I've got some retries where a wingman is reporting as on Linux Docker Desktop, but I'm still waiting for any of those to return to get more details. There are almost certainly multiple reasons for incorrect O/S reporting from Windows systems, some of which might be [user?] configuration errors but others might be client issues. As I have no Windows systems I can't experiment... Cheers - Al. P.S. There have been past mentions of this sort of thing in other parts of the WCG forums, and I seem to recall having seen it raised elsewhere too. Certainly, CPDN has had some stuff about getting Linux tasks on Windows platforms, and some are running a BOINC client under WSL to do so (rather than using a VM), I think... [Edit 1 times, last edit by alanb1951 at Dec 2, 2025 9:35:24 AM] |
||
|
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 278 Status: Offline Project Badges:
|
My guess is it has "something" to do with WSL, take a look at this workunit: https://www.worldcommunitygrid.org/contributi...071_9340,2,-Result%20name The one on the top appears to be due to an installation error. I have had no MCM errors or invalids after installing it over the weekend. However, since the validator appears to match OS with OS (all the ones I've looked at have Alpine Linux matches), I could see this slowing things down due to it being less common than simply Windows. Running the TOP command, I don't see MCM actually running on WSL, which I wouldn't expect it to. At this point, I'm leaning to uninstalling the feature.
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
|
||
|
|
|