Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 119
Posts: 119   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 9878 times and has 118 replies Next Thread
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12559
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

Only 3 extremes and 6 accelerated validated in the last day.

The movable extremes will have all moved on to accelerated within a week.

Mike
[Feb 25, 2025 9:11:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12559
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

Another day and only 1 extreme and 1 accelerated have moved on.

The extreme is now in generation 127 and the accelerated in generation 135.

463 normals moved.

Mike
[Feb 26, 2025 3:31:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1114
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

I want to state that I KNOW this is a normal and not an extreme. We have few extremes moving at the moment as far as I can tell, so I thought I'd post my lowest generation for fun until the extremes move again.

ARP1_0033794_138_0 Darwin 24.3.0 In Progress 2025-03-19 02:48:53 UTC 2025-03-25 02:48:53 UTC
ARP1_0033794_138_1 Darwin 24.3.0 Pending Validation 2025-03-19 02:48:09 UTC 2025-03-19 07:19:58 UTC 4.47 / 4.47 499.2 / 0

https://www.worldcommunitygrid.org/contribution/workunit/683320205

Doesn't look like my wingman is going to do this WU, so maybe one of you will get it in a few days.
[Mar 23, 2025 3:35:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 837
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

Would it be possible to design a script to automate the distribution of the stuck work units? (On the server side, of course, not on the user side)

Not sure if the stuck work units became that way because they error'd out or if they just need longer deadlines to process because the terrain and weather are more complicated.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[May 9, 2025 8:59:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12559
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

WCG have said that they will be looking at the stuck units after MAM1 starts.

No doubt they will analyse why they are stuck and group them into possible strategies.

Did they just need more time?
Will shortening the TimeStep suffice?
Do they need to backtrack some generations and shorten the TimeStep?
Any other ideas?

Mike
[May 10, 2025 12:47:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1057
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

Mike,

Good summary and comments!

I'll just add that there will be some that got stuck because of errors that have nothing to do with time steps -- witness that recent Mac WU (highlighted by Unixchick) that had 6 results that wouldn't validate, and some WUs from way back that had lots of download errors and/or runtime errors[*1] but had one task that was waiting to validate (so almost certainly not a time step issue!) which ended up "Too Late" (really "Can't use", but...)

As for analysing why they are stuck -- I wonder if the ARP1 support system already stores a record of the entire task returns for all failed units or whether they'll have to identify each stalled cell's last WU and try to dig the error states out of the result archives somehow. I imagine the latter wouldn't be a bundle of fun...

Whilst simply trying to "restart" cells with a longer time step "in case" is possible it's not an ideal solution, especially for Extreme generations!...

Cheers - Al.

*1 -- I seem to recall having seen SIGILL (usually, but not always, from FreeBSD) and SIGSEGV on WUs sent to Linux hosts; unfortunately, I can't find the notes I took at the time :-(
----------------------------------------
[Edit 1 times, last edit by alanb1951 at May 10, 2025 9:03:28 PM]
[May 10, 2025 9:02:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12559
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

Al

Some of the errors/verifications could relate to not matching the actual weather that occurred.

The storage issue shouldn't be a problem because IBM were talking about winding back 1 or 2 generations if shortening the TimeStep didn't work. But whether they have been kept this long is another matter.

Mike
[May 11, 2025 12:33:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1057
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

Some of the errors/verifications could relate to not matching the actual weather that occurred.
Thanks for mentioning that possibility!

I doubt they'd want an exact match but it would be interesting to know how much tolerance would be allowed before that caused a failure and, possibly, the need to try a shorter time step to see if it got a better match.

I'm presuming that each new generation is seeded with actual starting weather (rather than using whatever came out of the previous run, which might be more appropriate if trying to do a slightly longer look-ahead test...)

Cheers - Al.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at May 11, 2025 2:08:04 AM]
[May 11, 2025 2:07:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12559
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Extremes thread

Al

If they were to scan ahead they would need a pretty close match as otherwise they could diverge quite quickly.

Mike
[May 11, 2025 12:02:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 119   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 ]
[ Jump to Last Post ]
Post new Thread