World Community Grid - View Thread - app_config: max

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: app_config: max_concurrent and scheduling

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 15

[ ]

Author

This topic has been viewed 3804 times and has 14 replies

Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

45 day badge for Nutritious Rice for the World

1 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

20 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: app_config: max_concurrent and scheduling

The first thing I saw was that two ARP1 WUs were 'waiting to run'. Then I realised that five others were running. That meant that SEVEN lots of memory had been allocated for these WUs -- not leaving much for the other things I'd like to run from time to time.

You can eliminate the extra memory allotted for those "waiting to run" by de-selecting "Leave non-GPU tasks in memory".
It is no big deal for performance; you just have to load an application from the disk again when it is needed.

[Dec 24, 2019 3:33:52 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1406
Status: Offline
Project Badges:

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: app_config: max_concurrent and scheduling

. . . and for the ARP1's you will loose hours of crunching time sad

[Dec 24, 2019 6:53:54 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for Microbiome Immunity Project

10 year badge for OpenPandemics - COVID-19


Re: app_config: max_concurrent and scheduling

Apis

I suspect from what you initially said that you have 8 threads available on your machine. That is how many I have and I find that with 6 running arp1 it runs slower. I would suggest that you reduce the threads allowing arp1 to 4. The other projects can share the other 4 and you will boost total output.

As for re-sends, if they are soon after the initial send (say up to about 24 hours) then they come with the standard 7 day return, but if they are later, then they come with 3.5 days to finish. Even with the shorter period, they don't necessarily jump the queue immediately but might do so later.

Mike

[Jan 10, 2020 5:09:36 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: app_config: max_concurrent and scheduling

Hi Mike,

Thanks for your feedback and comments.

I had been running MIP1 and found that I couldn't run too many (more than 4, IIRC) without impacting performance, presumably due to the well known L3 cache 'problem'. With ARP1 running as well as MIP1 I found I needed to run even fewer MIP1 or it impacted ARP1. But now that I run (usually) 6 ARP1 and 2 MCM1 I don't think ARP1 is falling over itself. However, I tend to look at points/hr and not elapsed time, so the situation may not be straightforward.

As to the scheduling issue, I did subsequently see another situation when a seventh ARP1 WU was shown in status 'Waiting to run', but it had an elapsed time of 0 and wasn't using much memory. When one of the executing ARP1 WUs finished, I was surprised to find that it hadn't started, but another one had. Nothing I tried would get that one to start properly and in the end I killed it. I put it down to 'Just one of those (annoying) things'.

[Jan 10, 2020 6:50:34 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:


Re: app_config: max_concurrent and scheduling

apis

My primary purpose is to help the projects with badge-hunting secondary, so I have been juggling combinations in Device Profiles and app_config to try to achieve the best result.

My current app_config settings when I have arp1 units is 3 arp1, 4 mcm1 & 3 mip1. That tends to allow me to run 3 arp1, 3 mcm1 & 2 mip1. If I have no arp1 units, like now, mcm1 I put mcm1 up to 6 and mip1 up to 4. That gives me 5 mcm1 and 3 mip1.

My Device Profile are set to 6 arp1 (3*2 to keep the machine 'reliable'), 6 mcm1 and 4 mip1 at all times, so that updates occur regularly to try to get more arp1 whenever possible.

As the capacity problems with arp1 and mip1 are different, this maximises the throughput

I also temporarily pause any units which get too close to the one in front to spread the peaks. In other words I prevent 'tailgating'.

My current problem is coming next on the other thread.

Mike

[Jan 10, 2020 9:19:55 PM]

[ ]