World Community Grid - View Thread - Getting a flood of tasks way beyond WU buffer preference

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Getting a flood of tasks way beyond WU buffer preference

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 12

[ ]

Author

This topic has been viewed 2801 times and has 11 replies

wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

1 year badge for Help Fight Childhood Cancer

1 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

100 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

20 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Getting a flood of tasks way beyond WU buffer preference

This happened to one of my computers early this week and happened again to another one yesterday. For whatever reason, suddenly they kept requesting lots of work way beyond buffer preference and what the computer can process in time. I set work_buf_min_days at 0.3 and work_buf_additional_days at 0.2, but got 1000+ WU that would take at least a few days to complete, with many tasks likely not able to start before deadline.

For the second time, I happen to have a script recording work level every 10 minutes. All timestamps below are in PDT.

2021-06-18 21:00:02,133 - INFO - World Community Grid | 75 tasks, 18 running
2021-06-18 21:10:02,256 - INFO - World Community Grid | 249 tasks, 28 running
2021-06-18 21:20:02,050 - INFO - World Community Grid | 404 tasks, 28 running
2021-06-18 21:30:01,725 - INFO - World Community Grid | 552 tasks, 28 running
2021-06-18 21:40:02,555 - INFO - World Community Grid | 695 tasks, 28 running
2021-06-18 21:50:02,448 - INFO - World Community Grid | 847 tasks, 28 running
2021-06-18 22:00:02,491 - INFO - World Community Grid | 991 tasks, 28 running
2021-06-18 22:10:02,577 - INFO - World Community Grid | 1023 tasks, 28 running
2021-06-18 22:20:02,603 - INFO - World Community Grid | 1022 tasks, 28 running

During that an hour or so it just kept fetching every two minutes because that's as fast as the project allowed.

Jun 18 21:04:41 S8026 boinc[1459]: 18-Jun-2021 21:04:41 [World Community Grid] Sending scheduler request: To fetch work.
Jun 18 21:04:41 S8026 boinc[1459]: 18-Jun-2021 21:04:41 [World Community Grid] Requesting new tasks for CPU
Jun 18 21:04:44 S8026 boinc[1459]: 18-Jun-2021 21:04:44 [World Community Grid] Scheduler request completed: got 40 new tasks
Jun 18 21:04:44 S8026 boinc[1459]: 18-Jun-2021 21:04:44 [World Community Grid] Project requested delay of 121 seconds

The two computers don't have much in common. One is Windows 10 with 7.16.11 client. The logs above are from my Ubuntu 20.04 server, running 7.16.6 client. They have different WCG profiles but I've set a limit of ARP tasks for both. They both run multiple BOINC projects including WCG. They've been working well for quite a few months. So far only WCG had this two instances of excessive fetch. I haven't really changed any local settings before this happened. The only settings I have touched in past few weeks are the project WU limit on WCG profile, but that's not first time I changed that either.

This doesn't seem easily reproducible. After that period, all later logs correctly reported "job cache full" as I have work for more than a few days. For all the fetched work, the estimated remaining time were pretty accurate. While I don't track WU counts for my windows machine, I recall the count was also just above 1000 WUs, probably the magic number 1023 or 1024. The two computers have quite different number of cores, so stopping fetching at same number is interesting. There seems to be a hard limit stopped the fetching eventually.

I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side?

----------------------------------------
[Edit 6 times, last edit by wujj123456 at Jun 19, 2021 6:44:40 PM]

[Jun 19, 2021 6:23:27 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7753
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Getting a flood of tasks way beyond WU buffer preference

I have been running WCG since 2006 and have never had this happen. One thing you could do is limit the number of work units in the profile. In other words do not have any projects listed with "unlimited."
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Jun 19, 2021 7:25:22 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1328
Status: Offline
Project Badges:

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: Getting a flood of tasks way beyond WU buffer preference

I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side?

Do you use an app_config.xml for WCG with a low max_concurrent>?</max_concurrent> for one of the apps?

[Jun 19, 2021 7:41:01 PM]

wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:


Re: Getting a flood of tasks way beyond WU buffer preference

I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side?

Do you use an app_config.xml for WCG with a low max_concurrent>?</max_concurrent> for one of the apps?

Yes, I have it set for ARP tasks to manage memory usage. This is something I started doing a couple of weeks ago for WCG when ARP tasks suddenly became abundant, but it has been a while too.

Windows host:
Web profile: ARP and MCM limited to 32
Local concurrent limit: ARP limited to 12

Ubuntu host:
Web profile: ARP limited to 16
Local concurrent limit: ARP limited to 12

I have been using concurrent limit for CPDN and LHC projects for quite a long time too and haven't seen such behavior though.

Was that a known problem for WCG?

----------------------------------------
[Edit 1 times, last edit by wujj123456 at Jun 20, 2021 2:22:36 AM]

[Jun 20, 2021 2:14:33 AM]

wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:


Re: Getting a flood of tasks way beyond WU buffer preference

Yeah, I looked into it a bit, but the max it let me to put is 64 for each project, which would impact projects with short WUs (like OPN) quite unfairly if I want to keep half a day or one day of buffer.

Honestly that's still better than getting too many WUs I couldn't finish. If there aren't other solutions I will go and set limit for each so I couldn't possibly get 1K+ tasks again.

[Jun 20, 2021 2:21:00 AM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1328
Status: Offline
Project Badges:


Re: Getting a flood of tasks way beyond WU buffer preference

I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side?

Do you use an app_config.xml for WCG with a low max_concurrent>?</max_concurrent> for one of the apps?

Try a web preference limit of 12 tasks for ARP and disable/remove/rename app_config.xml for WCG.
Look whether overloading with tasks disappear.

[Jun 20, 2021 6:04:16 AM]

wujj123456
Cruncher
Joined: Jun 9, 2010
Post Count: 38
Status: Offline
Project Badges:


Re: Getting a flood of tasks way beyond WU buffer preference

Try a web preference limit of 12 tasks for ARP and disable/remove/rename app_config.xml for WCG.
Look whether overloading with tasks disappear.

I knew that worked, but the problem is that it only allows 12 ARP task in total, whether it's running or not. With a relatively large WU buffer I want, ARP got allocated far less time.

I think I will go and put a limit for every project in web profile and leave the local concurrent limit.

I am still curious if anyone has details on the exact bug causing this, whether this is specific to WCG or client. (I kinda feel it's the latter, since it's client asking to fetch work.)

[Jun 20, 2021 5:53:18 PM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1328
Status: Offline
Project Badges:


Re: Getting a flood of tasks way beyond WU buffer preference

I am still curious if anyone has details on the exact bug causing this, whether this is specific to WCG or client. (I kinda feel it's the latter, since it's client asking to fetch work.)

In my opinion it's a BOINC client bug, but David Anderson is not convinced.
It only happens when app_config is in use with max_concurrent on app-level.
Fetch work is always triggered during the last three minutes of any running task.
Be happy WCG has a request backoff of 121 seconds, else WCG would even request more often unneeded work.

[Jun 21, 2021 5:36:51 AM]

nanoprobe
Master Cruncher
Classified
Joined: Aug 29, 2008
Post Count: 2998
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

5 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

20 year badge for Computing for Clean Water

5 year badge for Computing for Sustainable Water

20 year badge for Uncovering Genome Mysteries


Re: Getting a flood of tasks way beyond WU buffer preference

Happened to me last night. BOINC downloaded 1000 OPN1 tasks on a 1 day buffer setting that should have downloaded 200 max on that machine.

----------------------------------------

In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.

[Jul 1, 2021 2:13:18 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12502
Status: Offline
Project Badges:

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

45 day badge for Computing for Sustainable Water

5 year badge for Outsmart Ebola Together

2 year badge for Microbiome Immunity Project

10 year badge for OpenPandemics - COVID-19


Re: Getting a flood of tasks way beyond WU buffer preference

All projects are readily available, except HST, so you are unlikely to run out.
Set your project limits on ARP to 1 or 2 above your max in app_config which is best retricted to half your threads. For the others set to your total threads, unless that is more than 64.

Mike

[Jul 22, 2021 9:02:39 PM]

[ ]