Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 119
|
![]() |
Author |
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12559 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Only 3 extremes and 6 accelerated validated in the last day.
The movable extremes will have all moved on to accelerated within a week. Mike |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12559 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Another day and only 1 extreme and 1 accelerated have moved on.
The extreme is now in generation 127 and the accelerated in generation 135. 463 normals moved. Mike |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 1114 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I want to state that I KNOW this is a normal and not an extreme. We have few extremes moving at the moment as far as I can tell, so I thought I'd post my lowest generation for fun until the extremes move again.
ARP1_0033794_138_0 Darwin 24.3.0 In Progress 2025-03-19 02:48:53 UTC 2025-03-25 02:48:53 UTC ARP1_0033794_138_1 Darwin 24.3.0 Pending Validation 2025-03-19 02:48:09 UTC 2025-03-19 07:19:58 UTC 4.47 / 4.47 499.2 / 0 https://www.worldcommunitygrid.org/contribution/workunit/683320205 Doesn't look like my wingman is going to do this WU, so maybe one of you will get it in a few days. |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 837 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Would it be possible to design a script to automate the distribution of the stuck work units? (On the server side, of course, not on the user side)
----------------------------------------Not sure if the stuck work units became that way because they error'd out or if they just need longer deadlines to process because the terrain and weather are more complicated.
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12559 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
WCG have said that they will be looking at the stuck units after MAM1 starts.
No doubt they will analyse why they are stuck and group them into possible strategies. Did they just need more time? Will shortening the TimeStep suffice? Do they need to backtrack some generations and shorten the TimeStep? Any other ideas? Mike |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1057 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Mike,
----------------------------------------Good summary and comments! I'll just add that there will be some that got stuck because of errors that have nothing to do with time steps -- witness that recent Mac WU (highlighted by Unixchick) that had 6 results that wouldn't validate, and some WUs from way back that had lots of download errors and/or runtime errors[*1] but had one task that was waiting to validate (so almost certainly not a time step issue!) which ended up "Too Late" (really "Can't use", but...) As for analysing why they are stuck -- I wonder if the ARP1 support system already stores a record of the entire task returns for all failed units or whether they'll have to identify each stalled cell's last WU and try to dig the error states out of the result archives somehow. I imagine the latter wouldn't be a bundle of fun... Whilst simply trying to "restart" cells with a longer time step "in case" is possible it's not an ideal solution, especially for Extreme generations!... Cheers - Al. *1 -- I seem to recall having seen SIGILL (usually, but not always, from FreeBSD) and SIGSEGV on WUs sent to Linux hosts; unfortunately, I can't find the notes I took at the time :-( [Edit 1 times, last edit by alanb1951 at May 10, 2025 9:03:28 PM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12559 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Al
Some of the errors/verifications could relate to not matching the actual weather that occurred. The storage issue shouldn't be a problem because IBM were talking about winding back 1 or 2 generations if shortening the TimeStep didn't work. But whether they have been kept this long is another matter. Mike |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 1057 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Some of the errors/verifications could relate to not matching the actual weather that occurred. Thanks for mentioning that possibility!I doubt they'd want an exact match but it would be interesting to know how much tolerance would be allowed before that caused a failure and, possibly, the need to try a shorter time step to see if it got a better match. I'm presuming that each new generation is seeded with actual starting weather (rather than using whatever came out of the previous run, which might be more appropriate if trying to do a slightly longer look-ahead test...) Cheers - Al. [Edit 1 times, last edit by alanb1951 at May 11, 2025 2:08:04 AM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12559 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Al
If they were to scan ahead they would need a pretty close match as otherwise they could diverge quite quickly. Mike |
||
|
|
![]() |