Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 65
Posts: 65   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 97763 times and has 64 replies Next Thread
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 926
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

Most of my "Waiting to be sent" are from September 19-23 issue date from the cluster.

[*2] I'd love to know what happened there, but I don't reckon the cluster manager(s) will come and post about it :-)

If they're a big enough volunteer to have those resources, one would think they participate in the forums, but I dunno.
Yes, but if there is a learning experience there it might be of a type that's embarrassing enough to cause an unwillingness to share :-)

Personally, I'd like to see a FAQ on the best way to set up multi-node systems to get the best out of BOINC (and WCG in particular) without the risk of consequences such as have been seen in this case. I know little or nothing about the technical side of cluster set-up but I do know WCG profiles can be used to avoid grabbing more work than can be processed in the relatively short lifetime some nodes seem to have in case it's not possible to do a clean detach (releasing unfinished tasks) when a node goes away!

with 1456 different device names.

How can you tell that? Very interesting.

As seems to be quite common with both clustered nodes and one-off clients in large computing environments, device names tend to have a pattern which includes a node identifier as part of a [much] longer name. How to find out the device names was discussed quite early in this thread

Given that WCG doesn't follow the typical BOINC pattern of allowing the tracing of results to devices (and hence to users if not anonymous), posting actual device name details would be severely frowned upon by WCG, so I can't "show my work" in full detail :-)

Cheers - Al.

[Edit] P.S. I can understand (and respect) the privacy aspects of the WCG site -- one of the "selling points" used to be that running WCG work in a corporate environment would not expose information about users or hardware used, and that carried over when moving to the BOINC core. It's just unfortunate that it doesn't help community members getting involved in diagnostic efforts :-)
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Oct 2, 2023 4:37:27 AM]
[Oct 2, 2023 4:09:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 926
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

Adri,

Thanks for providing some information from a [much] larger data-set than mine :-)

I note your observation of "slow" -- for what it's worth, I've just processed another day's wingman data and not a single one of the Waiting tasks holding up my validations has been issued :-(

One good thing -- still no sign of new stuff going to Linux 5.15.107+...

Cheers - Al.
[Oct 2, 2023 8:46:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2143
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

Another look at all my Pending SCC1-tasks in combination with their wingmen - in other words, the workunits that they're part of - reveals:
892 out of 1067 workunits contain a task Waiting to be sent;
871 of these 892 Waiting tasks have a 5.15.107+-wingman;
of these 871 5.15.107+-wingmen (all of them No Reply):
- 63 were issued on the 19th of September,
- 116 on the 20th,
- 367 on the 21st,
- 285 on the 22nd,
- 26 on the 23rd, and
- 14 on the 24th.

Adri
[Oct 2, 2023 10:50:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2143
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

This morning I took another look again at all my Pending SCC1-tasks - hasting myself to add that I don't want to make a habit of this, but for the moment … -

890 out of 1166 workunits contain a task Waiting to be sent.
The two oldest ones appear to have disappeared from the list. No new Waiting ones have been added.

Adri
[Oct 3, 2023 8:51:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1944
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

Well, I spend a few minutes to check on this issue and here is what I found (though this unfortunately will show up late and out of context due to that spiteful moderation from WCG Towers).

I have currently one Linux host running (Linux Mint 21) and that one doesn't seem to show no such WUs "waiting to be sent".

I do see a (slowly) increasing number of WUs sitting forever (at least MUCH longer than they should) in "Pending Verification" rather than "Pending Validation".

A substantial number of those WUs are from my (slightly older) MacBook Pro, like
https://www.worldcommunitygrid.org/contribution/workunit/385297839

But about 70% of all "Waiting to be sent" WUs are on various versions of Windows (mostly Windows 10 though), like
https://www.worldcommunitygrid.org/contribution/workunit/383284650

So from my perspective, this indicates a far more general issue than just WUs coming back with "no reply" from some ominous hoarder running some specific version of Linux..

Overall, while the number of WUs in PV jail (be it validation or verification) has roughly doubled compared to the numbers they were last year (and before the transition), the ratio seems to be more or less constant at about 10% of all WUs showing in the Results page.

Ralf
----------------------------------------

[Oct 3, 2023 3:52:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2143
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

Adri -- thanks for confirming that someone else is seeing the "not going to send to a Ryzen at the moment" behaviour.

This morning also some strange behaviour.
I found my two fast Ryzens were only containing GPU-work and that the SCC1-well had run dry for them.
When one Ryzen nearly ran out of GPU-work (there were 2 tasks left of which 1 had already finished(*1)), it started picking up MCM1 and SCC1 as well at 09:00 UTC. The second Ryzen was still holding some GPU-work and was still asking for GPU-work only, evidently refusing to ask for CPU-work (also despite updating the profile multiple times). The 7.20.2-client has been running since 31st of July:
$ ps -fuboinc
UID PID PPID C STIME TTY TIME CMD
boinc 3711 1 0 Jul31 ? 06:45:01 /usr/bin/boinc


On my (AMD) laptop, I can see that there was a small gap receiving SCC1-tasks:
<16> * SCC1_0004408_brachyury_6328_0  Linux Ubuntu  In Progress           2023-10-04T08:19:50

<17> SCC1_0004443_brachyury_92595_0 Linuxmint Pending Verification 2023-10-04T01:32:52
<17> * SCC1_0004443_brachyury_92595_1 Linux Ubuntu In Progress 2023-10-04T06:50:13

<18> * SCC1_0004318_KLF15-A_87152_0 Linux Ubuntu In Progress 2023-10-04T05:48:36


On my first Ryzen this gap is also visible, that is to say, it stopped receiving SCC1-tasks sometime after 06:07 UTC:
<12>   SCC1_0004408_brachyury_25508_0  Linux Debian  In Progress           2023-10-04T09:00:15
<12> * SCC1_0004408_brachyury_25508_1 Fedora Linux Pending Validation 2023-10-04T09:00:22

<13> SCC1_0004443_brachyury_2505_0 Linuxmint Valid 2023-10-03T20:50:09
<13> * SCC1_0004443_brachyury_2505_1 Fedora Linux Valid 2023-10-04T06:07:24


On my Intel machine, however, there is no clear gap, or else it could be found in this bit:
 <9> * SCC1_0004318_KLF15-A_91872_0  Fedora Linux  In Progress           2023-10-04T07:56:34

<10> SCC1_0004318_KLF15-A_87511_0 Linux Ubuntu Pending Verification 2023-10-04T05:49:59
<10> * SCC1_0004318_KLF15-A_87511_1 Fedora Linux In Progress 2023-10-04T07:29:31

<11> SCC1_0004319_KLF15-A_30929_0 Linuxmint Pending Verification 2023-10-03T08:21:37
<11> * SCC1_0004319_KLF15-A_30929_1 Fedora Linux In Progress 2023-10-04T07:29:31

<12> SCC1_0004447_brachyury_59899_0 Linux Ubuntu Pending Verification 2023-10-01T12:36:51
<12> * SCC1_0004447_brachyury_59899_1 Fedora Linux In Progress 2023-10-04T07:29:31

<13> SCC1_0004318_KLF15-A_77839_0 Linux Debian Pending Verification 2023-10-04T05:30:30
<13> * SCC1_0004318_KLF15-A_77839_1 Fedora Linux In Progress 2023-10-04T07:04:27


[*1] The Error log from the second to last running job shows that it finished at 8 seconds past 09:00 UTC (11:00 localtime):
	INFO:[11:00:08] End AutoDock...
INFO:Cpu time = 855.462421
11:00:08 (1222726): called boinc_finish(0)


And what do you know, as soon as the second Ryzen ran out of (GPU-)work, it stopped asking for GPU-work only and immediately started receiving new CPU-work:

(Using output from grep 'Requesting new tasks for ' /var/log/messages; for UTC times, subtract 2 hours: 08:44 localtime = 06:44 UTC)
04-Oct-2023 08:44:06 [World Community Grid] Requesting new tasks for CPU and NVIDIA GPU
04-Oct-2023 08:46:09 [World Community Grid] Requesting new tasks for NVIDIA GPU
:
[more than 3 hours long asking for NVIDIA GPU only]
:
04-Oct-2023 12:06:10 [World Community Grid] Requesting new tasks for NVIDIA GPU
04-Oct-2023 12:08:17 [World Community Grid] Requesting new tasks for CPU and NVIDIA GPU

Adri
PS Glad to see it's working again here.
[Oct 4, 2023 11:19:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7633
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

I notice you a number of "pending verification" units. I had looked through some of my pending units and saw there were many more "pending verification" units than normal. The "pending validation" units usually vastly outnumber the "pending verification" units but that ratio has seemed to switch to almost equal parts. I wonder if the protocols for requiring verification have changed.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Oct 4, 2023 12:58:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1944
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

I notice you a number of "pending verification" units. I had looked through some of my pending units and saw there were many more "pending verification" units than normal. The "pending validation" units usually vastly outnumber the "pending verification" units but that ratio has seemed to switch to almost equal parts. I wonder if the protocols for requiring verification have changed.

Cheers
i tried to mention this at least a couple of times before, but unfortunately my posts are still being moderated and censored.

"Pending validation" doesn't increase (in relative terms), but those "pending verification" does, slowly..

Ralf
----------------------------------------

[Oct 4, 2023 2:00:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2143
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

I notice you a number of "pending verification" units. I had looked through some of my pending units and saw there were many more "pending verification" units than normal. The "pending validation" units usually vastly outnumber the "pending verification" units but that ratio has seemed to switch to almost equal parts. I wonder if the protocols for requiring verification have changed.

Cheers

As they say in Dutch, "meten is weten"[*1] (= 'measuring is knowing'), Sgt.Joe.
I've just found that my Linux program 'wcgstats' downloads only 250 results if you specify more than 250 results to download. This should be ideal in my case, because the Linux-cluster "5.15.107+" from late September seems gone and my oldest result from the 250 selected ones appears to be from 1st October then.
So I want to count the 250 latest results that are Pending Validation or Pending Verification only, per device. And I want to do this for SCC1-tasks only, across all my devices. There is a neat little way to do this, using the following options of 'wcgstats' (since version 2.22.9):
-w  = show workunits
-s = filter the status of a task: P = Pending Validation, Q = Pending Verification
-a = filter the name of the science app, in this case SCC1
-m = filter the predefined sequence of the machine: 0 = all devices
-P = set the pagelength (number of workunits in a page on your screen)
-L = number of results to download

And it is neat, because within a few keystrokes the answer will be visible.
Now I'm starting the session:
$ wcgstats -wsPQ -aSCC1 -m0 -SSP1 -L250
* Let's try to locate the workunit.
Loading results ...
There are 250 pages of results available.

- What is the pagenumber of the results where the task can be found? [l (last)/q (quit)/PAGENUMBER] (e.g. 4; default 1):
* Showing page 1/250 of all SCC1-tasks with status ’P/Q’ on all of your devices:
<1> SCC1_0004436_brachyury_33138_0 Linux Arch In Progress 2023-10-06T08:02:58 2023-10-12T08:02:58
<1> * SCC1_0004436_brachyury_33138_1 Fedora Linux Pending Validation 2023-10-06T08:03:22 2023-10-06T09:32:54


- Did this show the desired workunit? [Y (yes)/n (no, next)/p (previous)/l (last)/q (quit)/c (change)/* (match)/PAGENUMBER]
- Do you want to add the <e>rrorlog(s), convert the output of the workunit to <f>orumcodes before anything, <j>ust show the JSON-filename, or <s>ummarize ALL entries? [e/f/j/s/N/q (quit)] s
fastdevice1 Pending Validation 127
fastdevice1 Pending Verification 24
fastdevice2 Pending Validation 66
fastdevice2 Pending Verification 12
i7-device1 Pending Validation 3
i7-device1 Pending Verification 4
laptopdevice Pending Validation 12
laptopdevice Pending Verification 2



There you have it. cool

Adri

(*1)This is one of the most common Dutch wisdom quotes, and it represents how the Dutch like to do things: with precision and preparation, and never too hurriedly.(link)
[Oct 6, 2023 11:37:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7633
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is there any way of finding your wingman’s host-Id?

I see you have about 16.8% of the total "pending" in "pending verification." I am running about 2.5% "pending verification" in the total "pending" category. The pending verification numbers had spiked a couple of days ago, but have now come back to more normal level. The "pending validations" have come down but are still higher than normal. If the "5.15.107+" cluster is no longer running, then I expect the "pending validations' to return to a more normal number. Currently about 51% of the completed units are "pending validation."
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Oct 6, 2023 7:36:46 PM]
[Oct 6, 2023 7:22:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 65   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 | Next Page ]
[ Jump to Last Post ]
Post new Thread