Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 70
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5945 times and has 69 replies Next Thread
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1118
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

Only resends going out at the. moment for MCM and ARP. looks like new WU generation is turned off.
[Jun 18, 2025 5:17:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1996
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

Is it just me, or is there no work anymore for both MCM1 and ARP1? None of my machines get any new workunits since yesterday evening.
No, it's not just you. Got just back in after spending the last two days pretty much out of the office all day and a couple of hosts have completely run out of WUs, a few more will probably run dry over night.

Let's hope that the guys in Toronto are not too hung over after Edmonton lost in the Stanley Cup final against the Florida Panthers and can fix this issue (the last of WUs) in the morning... sad

Ralf cool
----------------------------------------

[Jun 18, 2025 8:12:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

Finally an MCM re-send.

Mike
[Jun 18, 2025 12:52:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7780
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

Dry on all machines, probably sometime overnight. And it is not even the weekend.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Jun 18, 2025 1:13:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1996
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

Dry on all machines, probably sometime overnight. And it is not even the weekend.

Cheers
Yup, overnight, more hosts ran dry, and then sometime after 7:30am PST, the servers came back on. And so far, as far as I had time to check, I got one ARP1 _3 resend after the system came back up. Let's see what the day brings...


Ralf
----------------------------------------

[Jun 18, 2025 3:13:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1067
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

This lack of work followed the same sort of pattern as the previous two or three instances over the last month and a bit... As they haven't announced that the "MCM1 control" switch to Kubernetes has happened, more of these breaks in the flow might be expected :-(

On this occasion I got my last "new" MCM1 task around 18:00 UTC on 2025-06-17, then there was a dribble of retries (few but reasonably frequent) until around 00:50 UTC on 2025-06-18, at which point nothing more until a fairly substantial number of retries between 04:45 and 06:25 UTC, a couple more before 08:30 UTC then nothing until about 14:30 UTC, at which point some more retries showed up.

A small number of "new" tasks showed up around 15:50 UTC today and there seems to have been a small but steady supply since then. Presumably whatever was causing the issues has been resolved -- I'd really like to know what had happened :-)

Cheers - Al.

P.S. I haven't received a new ARP1 task in the last 30+ hours, during which interval I've seen exactly one retry... I suspect Windows users might have seen more retries than Linux users :-)

P.P.S, During the last 24 hours or so there have also been some other "infrastructure" issues (resulting in sluggish web server access and loss of session login). I don't know whether they have a common cause...
[Jun 18, 2025 5:22:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
catchercradle
Senior Cruncher
England
Joined: Jan 16, 2009
Post Count: 158
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

And now back to no tasks again. Often, I notice there are tasks available for my Android phone when none for PC but even the phone ones have dried up at the moment.
[Jun 21, 2025 4:48:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1067
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

Regarding work availability (or otherwise)...

I wonder how often the "tasks are committed to other platforms" thing I see is because of a clump of Android retries (as a Linux user, I usually "blame" it on Windows users batting eyelashes...).

It might be more common than one might expect, as I suspect quite a lot of Android devices end up with [far] more work than they can cope with (and not always because of excessively large caches, given posts I've seen in various places about issues such as "It'll only run BOINC when the screen is awake...")

It would be interesting to know the missed deadline statistics for all platforms, but it's unlikely we'll see such data in the foreseeable future :-(

Cheers - Al.
[Jun 21, 2025 12:12:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Link64
Advanced Cruncher
Joined: Feb 19, 2021
Post Count: 144
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

I wonder how often the "tasks are committed to other platforms" thing I see is because of a clump of Android retries
The real question is: is it really necessary, that both tasks for the same WU are sent to the same platform? Has that been tested? Usually this should not be necessary.
----------------------------------------

[Jun 21, 2025 1:34:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1067
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No MCM1 WU

I wonder how often the "tasks are committed to other platforms" thing I see is because of a clump of Android retries
The real question is: is it really necessary, that both tasks for the same WU are sent to the same platform? Has that been tested? Usually this should not be necessary.
I can't prove it, but I would have thought that when WCG was at IBM that necessity would have been tested quite thoroughly for each project. All the current projects were set up, Beta tested and put into production by Kevin Reed and company (and I sometimes wish we could still access their knowledge...)

And your "usually" surely depends quite heavily on what sort of precision of result matching is required, which will depend on the methodology employed by the research!

[You will undoubtedly be familiar with some of what follows - sorry about that :-)]

Using homogeneous redundancy (HR) helps reduce the likely spread of results that would come from running the application on different hardware or because of compilation or run-time library differences between different Operating Systems, Even on a single "platform" there may be some drift in some circumstances, but it is likely to be far less significant than the drift between platforms...

In some cases, the nature of the workload may be such that there's likely to be an unacceptable amount of drift between results from different platforms because of compiler variations resulting in arithmetic expressions having different execution order, not to mention FPU rounding behaviour; most results would probably bracket the "truth" fairly closely (especially on a per-platform basis!), but if a "high" extreme from one platform had a "low" extreme from another platform the difference might be too extreme, which could lead to higher retry counts!

There are probably lots of BOINC projects that don't need high precision because of the nature of what they're doing, either because they only need an approximation or because their methods have built-in precision control (integer arithmetic or non-arithmetic methods?). Their configuration choices are more flexible!

Now, to WCG projects...

Some WCG projects have actually used Adaptive Replication, presumably when the methodology is somewhat "scatter gun" with lots of different control parameters used to examine a single target, similar to CPDN where small tweaks in parameters are used to get a statistically acceptable simulation from a given starting point. (Such projects typically used some variant of Autodock software.)

If a project uses AR, it more or less has to use homogeneous redundancy (HR) as a way of checking that a "trusted" host is actually producing results likely to be viable!

The ARP1 project has to use HR because the validation method for the huge data files uses a binary comparison method (I believe they use checksumming rather than bitwise compare); there might be subtle differences between the results from different platforms that would break that (although the results from either platform would actually be in a viable range!)

I don't know how precise the per-WU validation requirements are for MCM1, so I don't know how much drift might be acceptable. If the answer is "not a lot" then HR will be used anyway!

That said, sometimes even HR doesn't help -- we've seen some recent ARP1 work that has caused validation issues on Apple systems, with some cases failing completely, and others only appearing to validate if a pair of M4 systems happened to run the WU.

[Note also that WCG does not seem to support Anonymous Platform - if the application isn't built in-house it's not going to get used...]

Now I'll wait to see if someone tells me I'm talking rubbish (and explains why!)...

Cheers - Al.

P.S. Somewhere like MilkyWay or Einstein often has validation issues because HR is not in use. I think MW used to use AR for Separation at one stage (possibly without HR) but at some point that seemed to stop. I don't think I ever saw AR for N-body. (Einstein does use AR for at least one of its non-GPU applications...)
[Jun 21, 2025 5:21:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 70   Pages: 7   [ Previous Page | 1 2 3 4 5 6 7 ]
[ Jump to Last Post ]
Post new Thread