Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 65
|
![]() |
Author |
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 926 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Most of my "Waiting to be sent" are from September 19-23 issue date from the cluster. Yes, but if there is a learning experience there it might be of a type that's embarrassing enough to cause an unwillingness to share :-)[*2] I'd love to know what happened there, but I don't reckon the cluster manager(s) will come and post about it :-) If they're a big enough volunteer to have those resources, one would think they participate in the forums, but I dunno. Personally, I'd like to see a FAQ on the best way to set up multi-node systems to get the best out of BOINC (and WCG in particular) without the risk of consequences such as have been seen in this case. I know little or nothing about the technical side of cluster set-up but I do know WCG profiles can be used to avoid grabbing more work than can be processed in the relatively short lifetime some nodes seem to have in case it's not possible to do a clean detach (releasing unfinished tasks) when a node goes away! with 1456 different device names. How can you tell that? Very interesting. As seems to be quite common with both clustered nodes and one-off clients in large computing environments, device names tend to have a pattern which includes a node identifier as part of a [much] longer name. How to find out the device names was discussed quite early in this thread Given that WCG doesn't follow the typical BOINC pattern of allowing the tracing of results to devices (and hence to users if not anonymous), posting actual device name details would be severely frowned upon by WCG, so I can't "show my work" in full detail :-) Cheers - Al. [Edit] P.S. I can understand (and respect) the privacy aspects of the WCG site -- one of the "selling points" used to be that running WCG work in a corporate environment would not expose information about users or hardware used, and that carried over when moving to the BOINC core. It's just unfortunate that it doesn't help community members getting involved in diagnostic efforts :-) [Edit 1 times, last edit by alanb1951 at Oct 2, 2023 4:37:27 AM] |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 926 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Adri,
Thanks for providing some information from a [much] larger data-set than mine :-) I note your observation of "slow" -- for what it's worth, I've just processed another day's wingman data and not a single one of the Waiting tasks holding up my validations has been issued :-( One good thing -- still no sign of new stuff going to Linux 5.15.107+... Cheers - Al. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2143 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Another look at all my Pending SCC1-tasks in combination with their wingmen - in other words, the workunits that they're part of - reveals:
892 out of 1067 workunits contain a task Waiting to be sent; 871 of these 892 Waiting tasks have a 5.15.107+-wingman; of these 871 5.15.107+-wingmen (all of them No Reply): - 63 were issued on the 19th of September, - 116 on the 20th, - 367 on the 21st, - 285 on the 22nd, - 26 on the 23rd, and - 14 on the 24th. Adri |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2143 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This morning I took another look again at all my Pending SCC1-tasks - hasting myself to add that I don't want to make a habit of this, but for the moment … -
890 out of 1166 workunits contain a task Waiting to be sent. The two oldest ones appear to have disappeared from the list. No new Waiting ones have been added. Adri |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1944 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Well, I spend a few minutes to check on this issue and here is what I found (though this unfortunately will show up late and out of context due to that spiteful moderation from WCG Towers).
----------------------------------------I have currently one Linux host running (Linux Mint 21) and that one doesn't seem to show no such WUs "waiting to be sent". I do see a (slowly) increasing number of WUs sitting forever (at least MUCH longer than they should) in "Pending Verification" rather than "Pending Validation". A substantial number of those WUs are from my (slightly older) MacBook Pro, like https://www.worldcommunitygrid.org/contribution/workunit/385297839 But about 70% of all "Waiting to be sent" WUs are on various versions of Windows (mostly Windows 10 though), like https://www.worldcommunitygrid.org/contribution/workunit/383284650 So from my perspective, this indicates a far more general issue than just WUs coming back with "no reply" from some ominous hoarder running some specific version of Linux.. Overall, while the number of WUs in PV jail (be it validation or verification) has roughly doubled compared to the numbers they were last year (and before the transition), the ratio seems to be more or less constant at about 10% of all WUs showing in the Results page. Ralf ![]() |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2143 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Adri -- thanks for confirming that someone else is seeing the "not going to send to a Ryzen at the moment" behaviour. This morning also some strange behaviour. I found my two fast Ryzens were only containing GPU-work and that the SCC1-well had run dry for them. When one Ryzen nearly ran out of GPU-work (there were 2 tasks left of which 1 had already finished(*1)), it started picking up MCM1 and SCC1 as well at 09:00 UTC. The second Ryzen was still holding some GPU-work and was still asking for GPU-work only, evidently refusing to ask for CPU-work (also despite updating the profile multiple times). The 7.20.2-client has been running since 31st of July: $ ps -fuboinc On my (AMD) laptop, I can see that there was a small gap receiving SCC1-tasks: <16> * SCC1_0004408_brachyury_6328_0 Linux Ubuntu In Progress 2023-10-04T08:19:50 On my first Ryzen this gap is also visible, that is to say, it stopped receiving SCC1-tasks sometime after 06:07 UTC: <12> SCC1_0004408_brachyury_25508_0 Linux Debian In Progress 2023-10-04T09:00:15 On my Intel machine, however, there is no clear gap, or else it could be found in this bit: <9> * SCC1_0004318_KLF15-A_91872_0 Fedora Linux In Progress 2023-10-04T07:56:34 [*1] The Error log from the second to last running job shows that it finished at 8 seconds past 09:00 UTC (11:00 localtime): INFO:[11:00:08] End AutoDock... And what do you know, as soon as the second Ryzen ran out of (GPU-)work, it stopped asking for GPU-work only and immediately started receiving new CPU-work: (Using output from grep 'Requesting new tasks for ' /var/log/messages; for UTC times, subtract 2 hours: 08:44 localtime = 06:44 UTC) 04-Oct-2023 08:44:06 [World Community Grid] Requesting new tasks for CPU and NVIDIA GPU: [more than 3 hours long asking for NVIDIA GPU only] : 04-Oct-2023 12:06:10 [World Community Grid] Requesting new tasks for NVIDIA GPU Adri PS Glad to see it's working again here. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7633 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I notice you a number of "pending verification" units. I had looked through some of my pending units and saw there were many more "pending verification" units than normal. The "pending validation" units usually vastly outnumber the "pending verification" units but that ratio has seemed to switch to almost equal parts. I wonder if the protocols for requiring verification have changed.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1944 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I notice you a number of "pending verification" units. I had looked through some of my pending units and saw there were many more "pending verification" units than normal. The "pending validation" units usually vastly outnumber the "pending verification" units but that ratio has seemed to switch to almost equal parts. I wonder if the protocols for requiring verification have changed. i tried to mention this at least a couple of times before, but unfortunately my posts are still being moderated and censored. Cheers "Pending validation" doesn't increase (in relative terms), but those "pending verification" does, slowly.. Ralf ![]() |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2143 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I notice you a number of "pending verification" units. I had looked through some of my pending units and saw there were many more "pending verification" units than normal. The "pending validation" units usually vastly outnumber the "pending verification" units but that ratio has seemed to switch to almost equal parts. I wonder if the protocols for requiring verification have changed. Cheers As they say in Dutch, "meten is weten"[*1] (= 'measuring is knowing'), Sgt.Joe. I've just found that my Linux program 'wcgstats' downloads only 250 results if you specify more than 250 results to download. This should be ideal in my case, because the Linux-cluster "5.15.107+" from late September seems gone and my oldest result from the 250 selected ones appears to be from 1st October then. So I want to count the 250 latest results that are Pending Validation or Pending Verification only, per device. And I want to do this for SCC1-tasks only, across all my devices. There is a neat little way to do this, using the following options of 'wcgstats' (since version 2.22.9): -w = show workunits And it is neat, because within a few keystrokes the answer will be visible. Now I'm starting the session: $ wcgstats -wsPQ -aSCC1 -m0 -SSP1 -L250 There you have it. ![]() Adri (*1)This is one of the most common Dutch wisdom quotes, and it represents how the Dutch like to do things: with precision and preparation, and never too hurriedly.(link) |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7633 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I see you have about 16.8% of the total "pending" in "pending verification." I am running about 2.5% "pending verification" in the total "pending" category. The pending verification numbers had spiked a couple of days ago, but have now come back to more normal level. The "pending validations" have come down but are still higher than normal. If the "5.15.107+" cluster is no longer running, then I expect the "pending validations' to return to a more normal number. Currently about 51% of the completed units are "pending validation."
----------------------------------------Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Oct 6, 2023 7:36:46 PM] |
||
|
|
![]() |