Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 260
Posts: 260   Pages: 26   [ Previous Page | 15 16 17 18 19 20 21 22 23 24 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 203398 times and has 259 replies Next Thread
TigerLily
Senior Cruncher
Joined: May 26, 2023
Post Count: 280
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Hi all,

I followed up with the tech team yesterday regarding the reduced work unit flow many of you discussed in this thread. They are currently working on getting ARP1 started, which should increase work unit availability. Unfortunately, it is not feasible for MCM1 work unit supply to be increased as the limiter is the database. They are testing an upgrade which may enable the database to serve more requests. We will be able to provide further updates about work unit supply next week.
[Apr 10, 2024 2:50:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7545
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Tiger Lily:
Thank you for the information. It does provide some insight to the present situation.
Edit: By the way, do you have any information on what the daily limit is for the database ?
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Apr 10, 2024 3:23:46 PM]
[Apr 10, 2024 3:06:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 292
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

TigerLily,
Your feedback is greatly appreciated whether it is in the forums or via email!
Thank you,
Buce
[Apr 10, 2024 6:03:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 835
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Thank you TigerLily !!!

It is nice to know what the tech team is working on now. I'm sure they have a long list, and it makes me personally happy to know that ARP is on the top.
[Apr 10, 2024 8:20:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gj82854
Advanced Cruncher
Joined: Sep 26, 2022
Post Count: 57
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Won't ARP and MCM use the same database? If the database is limited to 1000 for example. Doesn;t that mean 500 to ARP and 500 to MCM versus 1000 to MCM? How does ARP increase the workunit availability?
[Apr 11, 2024 2:19:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Won't ARP and MCM use the same database? If the database is limited to 1000 for example. Doesn;t that mean 500 to ARP and 500 to MCM versus 1000 to MCM? How does ARP increase the workunit availability?
I get the impression that some of the current issues might be on the work generation side rather than the feeder side (but I could be misinterpreting the occasional snippet of information we do get fed!) It would be consistent with persistent "No tasks available" messages, as the transitioner can't satisfy requests when there's nothing to offer to the users :-)

As for whether the feeders try for equality for all applications, I don't think so -- I think it's based on tasks queued by the transitioner when requests that can be satisfied come in or when retries are needed. It's then down to a combination of the queued requests and the order in which the feeder has been instructed to collect them, which should be application-agnostic under normal circumstances.

When available, the much longer-running ARP1 tasks should help reduce the demand pressure for MCM1 tasks... However, as available ARP1 work is constrained by how its tasks are generated, that alone won't get us back up to the "2 million results returned each day" of 18 or so months ago...

Hopefully whatever tuning they are trying will help resolve this -- only time will tell :-)

Cheers - Al.
----------------------------------------
[Edit 2 times, last edit by alanb1951 at Apr 11, 2024 4:26:23 AM]
[Apr 11, 2024 4:17:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 835
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

The on site stats say that we are returning 900k results a day. I think our peak was 1400k results, but no other project was running at the time. I'm guessing the lower amount is needed to allow clean up routines to access the db, so we don't have run away results lists like we had in the past.

ARP WU sending tends to stress out the system in a different way as the WU is large, but hopefully since they take longer to process, it is less access to the db overall.

We need to give the techs the time to find the balance between MCM and ARP the system can handle, and not whine too much about crashes and slowdowns as they work out the sweet spot.

I really hope they bring back ARP before I need to limit my WUs due to the heat of summer.
[Apr 11, 2024 4:04:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Unixchick , thanks for that post - lots to like and agree with!

One of the key points it makes is something which I didn't stress in my "technical" reply to gj82854
We need to give the techs the time to find the balance between MCM and ARP the system can handle, and not whine too much about crashes and slowdowns as they work out the sweet spot.
I sometimes think that a lot of users choose to forget that the new WCG is probably trying to run the same services on less hardware than was available to IBM, and in a different environment!...

I also wonder how many users think all projects (WCG and elsewhere) have infinite supplies of work to process -- it doesn't tend to work like that most of the time (which is why some folks who only do health or genetics projects see their backup projects out of work as well...) A classic example of that is TnGrid (Gene projects), which is my main project on a Raspberry Pi since OPN1 went away, but where work is [understandably] so infrequent that my fallback there (Einstein) gets a lot of time!

[Editorial: WCG is not the best place to come to if RAC (recently acquired credit) is one's primary motivation :-)]

Cheers - Al.

P.S. my other current projects on non-ARM systems are Einstein and MilkyWay so my systems are rarely short of work to do, even if the balance isn't always what I'd want :-) -- it's the science that matters, not the amount of it!

[Edit: slight rewording of second paragraph.]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Apr 11, 2024 5:58:48 PM]
[Apr 11, 2024 5:41:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

For information:

In case anyone gets worried about validated work apparently not disappearing, it's not "the assimilators again" :-)

It appears that they turned off the database purge between 12:20 and 12:30 UTC on 2024-04-11, and it hadn't been restarted by the end of their working day...

This also happened from about the same time on 2024-04-03 for 24 hours, and on that occasion it only tookl it a couple of hours to catch up on restart!. I posted about it in one of the News threads, and in my next post there (when it came back) I asked
Was it a scheduled [but unannounced] outage, or is there usually a restart each day at that time and yesterday it just didn't [re]start???
I didn't get an answer, but I wasn't really expecting one. Given this repeat, it looks as if it was/is planned :-)

Cheers - Al.
[Apr 12, 2024 6:01:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 2068
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Project Status (First Post Updated)

Anyone else seeing this, which also seems to stop the client from getting new tasks? It started just under an hour ago, or so.:"Another scheduler instance is running for this host"
I think I have seen that message before, maybe more than a year ago.

World Community Grid 2024-04-12 15:07:42 Requesting new tasks for CPU
World Community Grid 2024-04-12 15:07:44 Scheduler request completed: got 0 new tasks
World Community Grid 2024-04-12 15:07:44 Another scheduler instance is running for this host
World Community Grid 2024-04-12 15:07:44 Project requested delay of 121 seconds

Edit, added: Tasks can't be reported either. Request after request, the same task tries to be reported, but fails:

World Community Grid 2024-04-12 15:30:12 Sending scheduler request: To fetch work.
World Community Grid 2024-04-12 15:30:12 Reporting 1 completed tasks
World Community Grid 2024-04-12 15:30:12 Requesting new tasks for CPU
World Community Grid 2024-04-12 15:30:14 Scheduler request completed: got 0 new tasks
World Community Grid 2024-04-12 15:30:14 Another scheduler instance is running for this host
World Community Grid 2024-04-12 15:30:14 Project requested delay of 121 seconds
----------------------------------------
[Edit 1 times, last edit by Grumpy Swede at Apr 12, 2024 1:37:15 PM]
[Apr 12, 2024 1:20:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 260   Pages: 26   [ Previous Page | 15 16 17 18 19 20 21 22 23 24 | Next Page ]
[ Jump to Last Post ]
Post new Thread