Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 70
|
![]() |
Author |
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1118 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
Only resends going out at the. moment for MCM and ARP. looks like new WU generation is turned off.
|
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1996 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Is it just me, or is there no work anymore for both MCM1 and ARP1? None of my machines get any new workunits since yesterday evening. No, it's not just you. Got just back in after spending the last two days pretty much out of the office all day and a couple of hosts have completely run out of WUs, a few more will probably run dry over night.Let's hope that the guys in Toronto are not too hung over after Edmonton lost in the Stanley Cup final against the Florida Panthers and can fix this issue (the last of WUs) in the morning... ![]() Ralf ![]() ![]() |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12566 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Finally an MCM re-send.
Mike |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7780 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dry on all machines, probably sometime overnight. And it is not even the weekend.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 1996 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Dry on all machines, probably sometime overnight. And it is not even the weekend. Yup, overnight, more hosts ran dry, and then sometime after 7:30am PST, the servers came back on. And so far, as far as I had time to check, I got one ARP1 _3 resend after the system came back up. Let's see what the day brings...Cheers Ralf ![]() |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1067 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This lack of work followed the same sort of pattern as the previous two or three instances over the last month and a bit... As they haven't announced that the "MCM1 control" switch to Kubernetes has happened, more of these breaks in the flow might be expected :-(
On this occasion I got my last "new" MCM1 task around 18:00 UTC on 2025-06-17, then there was a dribble of retries (few but reasonably frequent) until around 00:50 UTC on 2025-06-18, at which point nothing more until a fairly substantial number of retries between 04:45 and 06:25 UTC, a couple more before 08:30 UTC then nothing until about 14:30 UTC, at which point some more retries showed up. A small number of "new" tasks showed up around 15:50 UTC today and there seems to have been a small but steady supply since then. Presumably whatever was causing the issues has been resolved -- I'd really like to know what had happened :-) Cheers - Al. P.S. I haven't received a new ARP1 task in the last 30+ hours, during which interval I've seen exactly one retry... I suspect Windows users might have seen more retries than Linux users :-) P.P.S, During the last 24 hours or so there have also been some other "infrastructure" issues (resulting in sluggish web server access and loss of session login). I don't know whether they have a common cause... |
||
|
catchercradle
Senior Cruncher England Joined: Jan 16, 2009 Post Count: 158 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
And now back to no tasks again. Often, I notice there are tasks available for my Android phone when none for PC but even the phone ones have dried up at the moment.
|
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1067 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Regarding work availability (or otherwise)...
I wonder how often the "tasks are committed to other platforms" thing I see is because of a clump of Android retries (as a Linux user, I usually "blame" it on Windows users ![]() It might be more common than one might expect, as I suspect quite a lot of Android devices end up with [far] more work than they can cope with (and not always because of excessively large caches, given posts I've seen in various places about issues such as "It'll only run BOINC when the screen is awake...") It would be interesting to know the missed deadline statistics for all platforms, but it's unlikely we'll see such data in the foreseeable future :-( Cheers - Al. |
||
|
Link64
Advanced Cruncher Joined: Feb 19, 2021 Post Count: 144 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
I wonder how often the "tasks are committed to other platforms" thing I see is because of a clump of Android retries The real question is: is it really necessary, that both tasks for the same WU are sent to the same platform? Has that been tested? Usually this should not be necessary.![]() |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1067 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I wonder how often the "tasks are committed to other platforms" thing I see is because of a clump of Android retries The real question is: is it really necessary, that both tasks for the same WU are sent to the same platform? Has that been tested? Usually this should not be necessary.And your "usually" surely depends quite heavily on what sort of precision of result matching is required, which will depend on the methodology employed by the research! [You will undoubtedly be familiar with some of what follows - sorry about that :-)] Using homogeneous redundancy (HR) helps reduce the likely spread of results that would come from running the application on different hardware or because of compilation or run-time library differences between different Operating Systems, Even on a single "platform" there may be some drift in some circumstances, but it is likely to be far less significant than the drift between platforms... In some cases, the nature of the workload may be such that there's likely to be an unacceptable amount of drift between results from different platforms because of compiler variations resulting in arithmetic expressions having different execution order, not to mention FPU rounding behaviour; most results would probably bracket the "truth" fairly closely (especially on a per-platform basis!), but if a "high" extreme from one platform had a "low" extreme from another platform the difference might be too extreme, which could lead to higher retry counts! There are probably lots of BOINC projects that don't need high precision because of the nature of what they're doing, either because they only need an approximation or because their methods have built-in precision control (integer arithmetic or non-arithmetic methods?). Their configuration choices are more flexible! Now, to WCG projects... Some WCG projects have actually used Adaptive Replication, presumably when the methodology is somewhat "scatter gun" with lots of different control parameters used to examine a single target, similar to CPDN where small tweaks in parameters are used to get a statistically acceptable simulation from a given starting point. (Such projects typically used some variant of Autodock software.) If a project uses AR, it more or less has to use homogeneous redundancy (HR) as a way of checking that a "trusted" host is actually producing results likely to be viable! The ARP1 project has to use HR because the validation method for the huge data files uses a binary comparison method (I believe they use checksumming rather than bitwise compare); there might be subtle differences between the results from different platforms that would break that (although the results from either platform would actually be in a viable range!) I don't know how precise the per-WU validation requirements are for MCM1, so I don't know how much drift might be acceptable. If the answer is "not a lot" then HR will be used anyway! That said, sometimes even HR doesn't help -- we've seen some recent ARP1 work that has caused validation issues on Apple systems, with some cases failing completely, and others only appearing to validate if a pair of M4 systems happened to run the WU. [Note also that WCG does not seem to support Anonymous Platform - if the application isn't built in-house it's not going to get used...] Now I'll wait to see if someone tells me I'm talking rubbish (and explains why!)... Cheers - Al. P.S. Somewhere like MilkyWay or Einstein often has validation issues because HR is not in use. I think MW used to use AR for Separation at one stage (possibly without HR) but at some point that seemed to stop. I don't think I ever saw AR for N-body. (Einstein does use AR for at least one of its non-GPU applications...) |
||
|
|
![]() |