Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Active Research Forum: Smash Childhood Cancer Thread: Is there any way of finding your wingman’s host-Id? |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 65
|
Author |
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 337 Status: Offline Project Badges: |
I’m having a build up of pending validation tasks and well over half of them are SCC WUs on my slowest cruncher (a 4th gen i3 laptop so definitely slow).
Looking at them in a bit more detail almost all of them appear to have the same wingman who does not appear to be returning any of them as complete. Is there any way I can find a host-id to confirm that it is the same host and carry my research forwards? |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12146 Status: Offline Project Badges: |
We all get PV delays from time to time. There are quite a few in my results listing, but I can't identify the wingman's machine - just the OS of it.
After 6 days it will either be returned or sent to another cruncher. Mike |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7580 Status: Recently Active Project Badges: |
I’m having a build up of pending validation tasks and well over half of them are SCC WUs on my slowest cruncher (a 4th gen i3 laptop so definitely slow). Mike is right. With these small SCC units there is a buildup of these. I have over 1000 in that category and am sure they will be validated eventually. Just keep on crunching. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 337 Status: Offline Project Badges: |
We all get PV delays from time to time. There are quite a few in my results listing, but I can't identify the wingman's machine - just the OS of it. After 6 days it will either be returned or sent to another cruncher. Mike I’m quite used to getting PVs but this is unusual. Normally they are slightly skewed towards my faster machines where the wingman takes longer than I do to process, this is heavily skewed to my slowest machine. Normally they tend to be MCM but these are mostly SCC. Normally there is a spread of wingmen, in this instance almost all are a single OS, Linux 5.15.107+ (rather than, for example, Ubuntu 22.04 LTS which is where 5.15 went). It just feels different and I’m curious. [Edit 1 times, last edit by Bryn Mawr at Sep 19, 2023 2:53:12 AM] |
||
|
supdood
Senior Cruncher USA Joined: Aug 6, 2015 Post Count: 333 Status: Offline Project Badges: |
I'm seeing the same thing. I usually have very low PV numbers as I now run only two old, slow laptops and am often the one keeping others' tasks in PV. Now I have a bunch (both MCM and SCC) in PV all with Linux 5.15.107+. I wonder if someone spun up a large compute core and got way too many tasks as an initial download while WCG learns the system.
-------------------------------------------------------------------------------- [Edit 2 times, last edit by supdood at Sep 19, 2023 11:36:51 AM] |
||
|
Bryn Mawr
Senior Cruncher Joined: Dec 26, 2018 Post Count: 337 Status: Offline Project Badges: |
I'm seeing the same thing. I usually have very low PV numbers as I now run only two old, slow laptops and am often the one keeping others' tasks in PV. Now I have a bunch (both MCM and SCC) in PV all with Linux 5.15.107+. I wonder if someone spun up a large compute core and got way too many tasks as an initial download while WCG learns the system. This is my suspicion. It appears to be coming back for more tasks frequently without completing any. |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12146 Status: Offline Project Badges: |
I'm seeing the same thing. I usually have very low PV numbers as I now run only two old, slow laptops and am often the one keeping others' tasks in PV. Now I have a bunch (both MCM and SCC) in PV all with Linux 5.15.107+. I wonder if someone spun up a large compute core and got way too many tasks as an initial download while WCG learns the system. This is my suspicion. It appears to be coming back for more tasks frequently without completing any. That also points to a new machine starting up or re-starting. Cache set to unlimited but restricted on download each time. Mike |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 873 Status: Offline Project Badges: |
I've been looking through my wingman data for the period from 10th September onwards, as that's when I first saw wingmen with O/S 5.15.107+ (and nothing further)... There is evidence that the systems involved are part of a large node of some form.[*1]
----------------------------------------I'm not sure how much of it is a case of "more than can be run" and how much is that there may be a huge number of individual hosts being set up -- I've seen 616 of these as wingmen across MCM1 and SCC1 between 10th and 19th September and there were 591 distinct device names! -- If nothing else, that must be causing havoc with the non-BOINC database :-) The comments by others regarding picking up [far] more work than is being run are accurate if my wingmen return rates (almost none!) are anything to go by. As there is no evidence so far that these systems will return "Not Started by Deadline" It is possible that they aren't staying on-line, and that new tasks are being picked up by new nodes! Someone who knows a lot more than I do about cloud computing and/or Docker (and such like) may have a better idea of what might actually be going on :-) -- I suspect the WCG folks might need to be looking at this anyway. Unfortunately, I'm now starting to see these systems picking up retries themselves :-( Cheers - Al. P.S. Unless I happen to catch one of these when it returns something with the client version in it, I can't tell whether choice of client might have anything to do with the "problem"... [*1] The relevant information is available via the API [as device names rather than host IDs], but publishing it would be against forum rules -- there have been [temporary] bans issued for showing host names other than one's own in the past! [Edited to alter the counts to reflect work returned on 19th September.] [Edit 1 times, last edit by alanb1951 at Sep 20, 2023 5:30:04 AM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7580 Status: Recently Active Project Badges: |
I am seeing quite a large jump in pending validation and verification work unitsfor SCC. I am seeing almost no resends. I am wondering if the volume of these real short work units is tending to overload the validator(s). The last 3 days have shown over 1 million(10^6) work units being returned for SCC.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 873 Status: Offline Project Badges: |
I am seeing quite a large jump in pending validation and verification work unitsfor SCC. I am seeing almost no resends. I am wondering if the volume of these real short work units is tending to overload the validator(s). The last 3 days have shown over 1 million(10^6) work units being returned for SCC. Cheers If this is on your Linux systems I suspect you are simply seeing a consequence of the issues with those systems that report their O/S as Linux 5.15.107+ :-( I've just processed my wingman data for 19th September, and it shows a continuing trend of giving me _1 tasks as wingman to a _0 task on one of those systems. Based on what's going on with MCM1 tasks those systems don't seem to return much, if anything; I'm starting to see their MCM1 tasks going No Reply (at last, but there are so many of them...) Those systems, being new and not [yet] sending work back to validate, will always need an initial wingman for SCC1, and if you draw that short straw then return your _1 task in a timely fashion it'll end up sat at Pending Validation for 6 or more days. I've currently got 435 SCC1 tasks Pending Validation and over 400 of them are waiting for a "5.15.107+" system! As for Pending Verification -- I currently have 6 SCC1 tasks Pending Verification; since SCC1 work resumed I've had 1349 tasks where I got first call. and 1312 of them validated without a second opinion. Of the other 37, most had a Pending Verification phase, but I reckon that's well within expectations. Fortunately, I haven't drawn one of those systems for my verification task yet :-) -- however, the latest one has been stuck at Waiting to send for over 5 hours so I suspect there's a little congestion in the system as the number of delayed tasks builds up and queries take longer... Frustrating, isn't it??? :-) Cheers - Al. |
||
|
|