Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 567
Posts: 567   Pages: 57   [ Previous Page | 48 49 50 51 52 53 54 55 56 57 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 42042 times and has 566 replies Next Thread
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2492
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Well, I got 120 tasks for one already running computer, but when I started another computer it's just "no tasks are available".

Either the issue came back almost immediately, or every empty computer is trying to get new work at the same time.
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Dec 1, 2025 9:49:10 PM]
[Dec 1, 2025 9:46:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

My last "other platforms" message was at 20:34:38 UTC today, and the first successful request for new work was at 20:36:18 UTC. Each system loaded up its full profile-defined quota in short order, so any subsequent requests made before a result was returned would just get told "no tasks" because I was at quota!

Unfortunately, by the time I returned my first few completed tasks about an hour later it was back to "other platforms" again (thanks, Grumpy Swede, for the heads up about that - glad I check before posting!); I can't offer an estimate of how much earlier the problem returned...

I wonder whether this issue relates to an increase in the number of Windows users who have a WSL/WSL2 instance (possibly courtesy of Berkeley's message about Docker?) that ends up mis-reporting its O/S as a Linux flavour when it is actually asking for or reporting Windows executables -- it that happens and something needs a retry the standard BOINC server code checks will stop anyone from being able to access that WU unless all existing results have the same problem![*1], though I don't know how many such items would need to be in the send cache to cause issues for most/all users.

I'll be interested in the eventual fix report(s) for this issue when it gets resolved. It is probably why the issues of truncated O/S names and hr_class issues were grouped together in the first bullet point of the 2025-11-21 Operational Status update wink -- finding out what the actual platform for a mis-attributed O/S in a returned result should be (when trying to send another one) looks challenging, and way beyond my current code-dive level(!), so I wish Tech Team the very best of luck doing anything more than catching it at result report time to stop the bad O/S information getting into the database in the first place![*2]

Cheers - Al.

[Edited to clarify that I can't give a reliable time for when the problem returned...]

*1 If any user wants to check this, a code dive into sched_send.cpp, sched_hr.cpp and hr.cpp should satisfy curiosity smile

*2 As for why some versions of the BOINC client report an O/S inconsistent with the actual platform, software has bugs sad ...
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Dec 1, 2025 10:33:29 PM]
[Dec 1, 2025 10:27:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2492
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

@alanb1951
Very interesting analysis of the possible cause of this issue. I think you are on to something there.

I sent a link to your post to Igor, who probably will give it to Dylan.
[Dec 1, 2025 10:52:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

@alanb1951
Very interesting analysis of the possible cause of this issue. I think you are on to something there.

I sent a link to your post to Igor, who probably will give it to Dylan.
If so, I do hope Dylan doesn't think that I'm automatically assuming he didn't know that already smile -- I only posted the speculation because I get restless when there's no obvious explanation out there!

It's the same when I sometimes post some fairly detailed status analysis because I'm frustrated by the lack of anything faintly resembling server status data (albeit that's a problem inherited from IBM...)

Cheers - Al.

P.S. I used to dislike "armchair DBAs/SysAdmins" when I was working -- I try to not be one myself, but...
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Dec 1, 2025 11:16:26 PM]
[Dec 1, 2025 11:09:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2492
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I'm sure Alan, that Dylan will not take it the wrong way.

New tasks are coming in here now. Since I only run this project, I'd better make sure that I have work for at least 2 days in the cache.
[Dec 1, 2025 11:29:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

New tasks are coming in here now. Since I only run this project, I'd better make sure that I have work for at least 2 days in the cache.
Yes -- that latest "other projects" seems to have been from some time between 20:50 and 21:38 UTC to some time between 23:15 and 23:20.

Interestingly, everything I received up to 20:50 was new work, then everything that I received between 23:22 and 23:26 was a retry; after that (23:29 and onwards) everything is new work again. Wondering about that wrong O/S thing I had a look at a few of the retries I'm dealing with, but at a first glance it looks as if the retries were for overdue Darwin tasks. (I might come back to that later once I can see a full list after my next wingmen script runs.)

Interestingly, some of the retries were O/S type "Linux" and O/S version "Docker Desktop" -- it'll be interesting to see what happens to those at validation time...

Cheers - Al.
[Dec 2, 2025 2:05:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Paul Schlaffer
Senior Cruncher
USA
Joined: Jun 12, 2005
Post Count: 278
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

If WSL is causing a problem it's important to know. I'm one of those that just enabled it when updating to 8.2.8. I do see in the task detail that the OS is incorrectly reported. Upon startup the correct OS shows in the Event Log along with the available WSL Distro. Therefore, you'd think this wouldn't be an issue.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
[Dec 2, 2025 6:22:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hans Sveen
Veteran Cruncher
Norge
Joined: Feb 18, 2008
Post Count: 981
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

My guess is it has "something" to do with WSL,
take a look at this workunit:

https://www.worldcommunitygrid.org/contributi...071_9340,2,-Result%20name
----------------------------------------
[Edit 2 times, last edit by Hans Sveen at Dec 2, 2025 8:46:25 AM]
[Dec 2, 2025 7:48:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1316
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I've seen some of those bad O/S type fields (as in Hans Sveen's example above) when a pair of Windows hosts failed to download and Linux got a look in for the retries! That bad O/S type was always associated with client 8.2.4, for what that might be worth. (And I had to modify my collection script because that field was actually multi-line and blew up my scanner logic!)

Anyway, I've just had a quick skim through recent workunits for which I've sent in a result or had a download error, and I can see quite a few viable WSL2 examples as wingmen amongst my validated WUs. For instance my wingman for MCM1_0243131_2048 (WU ID 783060213) reports as Linux Ubuntu with a microsoft-standard-WSL2 kernel version and it has run a Linux executable as one would hope! And that's a typical example.

I've also seen a fair number of Alpine Linux wingmen; in some cases where my task failed to download I've even seen a pair of these validate against one another, both having run a Windows executable (because they reported a Windows platform, whatever the O/S information might have said at the time I sampled the associated result data!) -- for instance, MCM1_0242404_6801 (WU ID 776714012).

And, as I mentioned in an earlier post, I've got some retries where a wingman is reporting as on Linux Docker Desktop, but I'm still waiting for any of those to return to get more details.

There are almost certainly multiple reasons for incorrect O/S reporting from Windows systems, some of which might be [user?] configuration errors but others might be client issues. As I have no Windows systems I can't experiment...

Cheers - Al.

P.S. There have been past mentions of this sort of thing in other parts of the WCG forums, and I seem to recall having seen it raised elsewhere too. Certainly, CPDN has had some stuff about getting Linux tasks on Windows platforms, and some are running a BOINC client under WSL to do so (rather than using a VM), I think...
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Dec 2, 2025 9:35:24 AM]
[Dec 2, 2025 9:30:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Paul Schlaffer
Senior Cruncher
USA
Joined: Jun 12, 2005
Post Count: 278
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

My guess is it has "something" to do with WSL,
take a look at this workunit:

https://www.worldcommunitygrid.org/contributi...071_9340,2,-Result%20name

The one on the top appears to be due to an installation error. I have had no MCM errors or invalids after installing it over the weekend.
However, since the validator appears to match OS with OS (all the ones I've looked at have Alpine Linux matches), I could see this slowing things down due to it being less common than simply Windows. Running the TOP command, I don't see MCM actually running on WSL, which I wouldn't expect it to. At this point, I'm leaning to uninstalling the feature.
----------------------------------------
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
[Dec 2, 2025 2:02:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 567   Pages: 57   [ Previous Page | 48 49 50 51 52 53 54 55 56 57 | Next Page ]
[ Jump to Last Post ]
Post new Thread