Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 387
Posts: 387   Pages: 39   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 48627 times and has 386 replies Next Thread
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

It's not entirely clear to me what the hold-up is with the "stuck" tasks, not just the 21 and 21 Ultras but the other Extreme stragglers. Can't they be manually kicked off? Were they all tasks with errors or something or just ones that took 30+ hours to compute? Not sure if we ever got a root cause analysis.

gl82854 said:
WCG should ignore the ultras. Let Delft run those on their systems.

Indeed. Running them 24/7 through an automated process on a couple machines would eliminate the hassle of deadlines and really speed them up.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Mar 17, 2025 9:43:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

hchc

I don't think we ever had a reason for the ultras. Many units that stuck initially were restarted by shortening the TimeStep which meant slicing the 2 day period into smaller slices and it worked.

How much shorter they tried I don't know. It did have the effect of lengthening the CPU time needed. It could be that they kept the shortened deadline for resends while increasing the CPU time because of the TimeStep. That would mean failing to meet deadlines.

If the TimeStep is reduced, the deadline should be increased.

My guess is that the problem might stem from difficult terrain where the patch is located causing problem calculating the weather.

This is a matter which is specific to ARP so should be posted on the ARP forum. This is a general forum.

Mike
[Mar 17, 2025 10:43:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

MCM is flowing well. Hopefully everyone has as many MCM as they want and caches are full.
ARP was increasing its flow, but seems to have topped out (reference number between 400-500)
[Mar 21, 2025 3:19:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

MCM is flowing well.
ARP may be stopped or have a lighter than usual flow. We have been working up to a reference number of 400-500 and now it is in the 300s, so hopefully someone will throw more ARPs into the hopper.
[Mar 22, 2025 3:10:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2494
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

MCM tasks has dried up totally. I do not know about ARP tasks though, since I do not crunch those.
[Mar 23, 2025 10:16:11 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

MCM seem to be flowing again. I did run out of my short queue of MCM WUs and get the dreaded " Tasks are committed to other platforms" message, but I've got MCM again.
ARP seems to be not flowing at all except for the occasional resend. Hopefully the team will fix that on Monday morning.
[Mar 24, 2025 12:22:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Between about 17:30 and 22:00 (UTC) on 2025-03-23 I was getting nothing but retries for missed deadline cases (and, of course, about 20% of them got Server Aborted).

Fortunately, with the retries my buffers were just large enough to not have completely run out anyway, but it's irritating to have idle cores because they [currently] don't seem to be able to send out missed deadline retries for MCM1 unless they stop feeding new work.

Hopefully, once all the new hardware is in place it might be possible to do something about that, though it might require some nifty scheduling of multiple feeders to mix tasks for the complete range of active WU numbers at any given time :-)

Cheers - Al.
[Mar 24, 2025 5:54:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 1294
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

MCM flow is higher than I've seen it in a while. Reference number in the 70k range.
Al is right that the system has a problem sending out MCM resends integrated with new WUs, which when the resends go out in a batch causes some of us (non window people??) to get the dreaded "committed to other platforms message"

ARP has some retries rattling around, but no fresh WUs in a while.

I hope the techs are getting some time to work on good things and not bandaiding the old system stuff.
[Mar 25, 2025 2:45:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2494
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

It's not only non windows people that get the dreaded "committed to other platforms message". I'm Windows only, and I often get those messages too.
[Mar 25, 2025 6:07:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
StFreddy
Cruncher
Joined: Oct 2, 2019
Post Count: 3
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

I recently recognized that new MCM workunits now take 2 hours 30 min to complete on my machine, it is now 1 hour longer than before. For years, it took 1:30 on my machine to complete 1 wu. Did anyone else also noticed this difference?
[Mar 25, 2025 8:09:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 387   Pages: 39   [ Previous Page | 6 7 8 9 10 11 12 13 14 15 | Next Page ]
[ Jump to Last Post ]
Post new Thread