Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 12
|
![]() |
Author |
|
wujj123456
Cruncher Joined: Jun 9, 2010 Post Count: 38 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This happened to one of my computers early this week and happened again to another one yesterday. For whatever reason, suddenly they kept requesting lots of work way beyond buffer preference and what the computer can process in time. I set work_buf_min_days at 0.3 and work_buf_additional_days at 0.2, but got 1000+ WU that would take at least a few days to complete, with many tasks likely not able to start before deadline.
----------------------------------------For the second time, I happen to have a script recording work level every 10 minutes. All timestamps below are in PDT. 2021-06-18 21:00:02,133 - INFO - World Community Grid | 75 tasks, 18 running During that an hour or so it just kept fetching every two minutes because that's as fast as the project allowed. Jun 18 21:04:41 S8026 boinc[1459]: 18-Jun-2021 21:04:41 [World Community Grid] Sending scheduler request: To fetch work. The two computers don't have much in common. One is Windows 10 with 7.16.11 client. The logs above are from my Ubuntu 20.04 server, running 7.16.6 client. They have different WCG profiles but I've set a limit of ARP tasks for both. They both run multiple BOINC projects including WCG. They've been working well for quite a few months. So far only WCG had this two instances of excessive fetch. I haven't really changed any local settings before this happened. The only settings I have touched in past few weeks are the project WU limit on WCG profile, but that's not first time I changed that either. This doesn't seem easily reproducible. After that period, all later logs correctly reported "job cache full" as I have work for more than a few days. For all the fetched work, the estimated remaining time were pretty accurate. While I don't track WU counts for my windows machine, I recall the count was also just above 1000 WUs, probably the magic number 1023 or 1024. The two computers have quite different number of cores, so stopping fetching at same number is interesting. There seems to be a hard limit stopped the fetching eventually. I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side? [Edit 6 times, last edit by wujj123456 at Jun 19, 2021 6:44:40 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7753 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have been running WCG since 2006 and have never had this happen. One thing you could do is limit the number of work units in the profile. In other words do not have any projects listed with "unlimited."
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1328 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side? Do you use an app_config.xml for WCG with a low max_concurrent>?</max_concurrent> for one of the apps? |
||
|
wujj123456
Cruncher Joined: Jun 9, 2010 Post Count: 38 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side? Do you use an app_config.xml for WCG with a low max_concurrent>?</max_concurrent> for one of the apps? Yes, I have it set for ARP tasks to manage memory usage. This is something I started doing a couple of weeks ago for WCG when ARP tasks suddenly became abundant, but it has been a while too. Windows host: Web profile: ARP and MCM limited to 32 Local concurrent limit: ARP limited to 12 Ubuntu host: Web profile: ARP limited to 16 Local concurrent limit: ARP limited to 12 I have been using concurrent limit for CPDN and LHC projects for quite a long time too and haven't seen such behavior though. Was that a known problem for WCG? [Edit 1 times, last edit by wujj123456 at Jun 20, 2021 2:22:36 AM] |
||
|
wujj123456
Cruncher Joined: Jun 9, 2010 Post Count: 38 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have been running WCG since 2006 and have never had this happen. One thing you could do is limit the number of work units in the profile. In other words do not have any projects listed with "unlimited." Cheers Yeah, I looked into it a bit, but the max it let me to put is 64 for each project, which would impact projects with short WUs (like OPN) quite unfairly if I want to keep half a day or one day of buffer. Honestly that's still better than getting too many WUs I couldn't finish. If there aren't other solutions I will go and set limit for each so I couldn't possibly get 1K+ tasks again. |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1328 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am curious if anyone else has experienced this too? I am trying to figure out if this is a bug on client or server, or if it's some mis-configuration on my side? Do you use an app_config.xml for WCG with a low max_concurrent>?</max_concurrent> for one of the apps? Yes, I have it set for ARP tasks to manage memory usage. This is something I started doing a couple of weeks ago for WCG when ARP tasks suddenly became abundant, but it has been a while too. ----- Was that a known problem for WCG? Try a web preference limit of 12 tasks for ARP and disable/remove/rename app_config.xml for WCG. Look whether overloading with tasks disappear. |
||
|
wujj123456
Cruncher Joined: Jun 9, 2010 Post Count: 38 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Try a web preference limit of 12 tasks for ARP and disable/remove/rename app_config.xml for WCG. Look whether overloading with tasks disappear. I knew that worked, but the problem is that it only allows 12 ARP task in total, whether it's running or not. With a relatively large WU buffer I want, ARP got allocated far less time. I think I will go and put a limit for every project in web profile and leave the local concurrent limit. I am still curious if anyone has details on the exact bug causing this, whether this is specific to WCG or client. (I kinda feel it's the latter, since it's client asking to fetch work.) |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1328 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am still curious if anyone has details on the exact bug causing this, whether this is specific to WCG or client. (I kinda feel it's the latter, since it's client asking to fetch work.) In my opinion it's a BOINC client bug, but David Anderson is not convinced. It only happens when app_config is in use with max_concurrent on app-level. Fetch work is always triggered during the last three minutes of any running task. Be happy WCG has a request backoff of 121 seconds, else WCG would even request more often unneeded work. |
||
|
nanoprobe
Master Cruncher Classified Joined: Aug 29, 2008 Post Count: 2998 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Happened to me last night. BOINC downloaded 1000 OPN1 tasks on a 1 day buffer setting that should have downloaded 200 max on that machine.
----------------------------------------
In 1969 I took an oath to defend and protect the U S Constitution against all enemies, both foreign and Domestic. There was no expiration date.
![]() ![]() |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12502 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All projects are readily available, except HST, so you are unlikely to run out.
Set your project limits on ARP to 1 or 2 above your max in app_config which is best retricted to half your threads. For the others set to your total threads, unless that is more than 64. Mike |
||
|
|
![]() |