| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 13
|
|
| Author |
|
|
EiF
Cruncher Joined: Nov 28, 2004 Post Count: 14 Status: Offline Project Badges:
|
My BOINC is set up to use network connectivity daily between 19:00 and 23:00, when I am at home. All works fine if I come home before 7pm: BOINC buffers completed tasks and uploads them at 19:00.
Today the laptop was running without internet until 19:37. BOINC made couple of attempts to upload during this time. Then I put the laptop to standby and resumed it at home at 20:21. BOINC attempted to connect *immediatelly* at the system resume. Obviously it did not succeed as it took a few seconds for the network to become available (wifi authentication, DHCP, etc. you know ...). But that was too late for BOINC and BOINC delayed next attempt by more than 3 hours: 14.9.2011 19:00:00 Resuming network activity 14.9.2011 19:00:01 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:00:02 Project communication failed: attempting access to reference site 14.9.2011 19:00:02 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:00:02 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:00:04 BOINC can't access Internet - check network connection or proxy configuration. 14.9.2011 19:01:03 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:01:04 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:01:04 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:02:04 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:02:05 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:02:05 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:03:06 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:03:07 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:03:07 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:04:07 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:04:08 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:04:08 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:05:43 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:05:44 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:05:44 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:07:16 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:07:18 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:07:18 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:15:46 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:15:47 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:15:47 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:22:15 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:22:16 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 19:22:16 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 19:37:36 Windows is suspending operations 14.9.2011 19:37:37 Suspending network activity - user request 14.9.2011 20:21:26 Windows is resuming operations 14.9.2011 20:21:27 Resuming network activity 14.9.2011 20:21:27 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0 14.9.2011 20:21:29 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname 14.9.2011 20:21:29 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0 At 21:07:00 this was still the last message in the message log. The completed task was in the list of transfers, "ready to upload" with "project delay" set to 2 hours and 54 minutes. Why so much? Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby? If this attempt fails, why BOINC waits 3 hours before trying for the second time? Thanks in advance for help. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
----------------------------------------Had not realized that the network back-off would rise so quickly to 3 hours, for sure, I've always found it making no sense to run the series of 1 minute intervals when a connection fails, particularly if the work in progress / cached far exceeds that value. Our house expert, Ingleside, may be able to tell what the logic is of the mechanical logic... does not strike me as AI. When this happens on my Linux box and the WIFI fails the running tasks start failing too, so I'm actually running with only a 30 minute window and manual connects when at the machine and know the line is up. For the moment, a workaround may be to narrow the window further, to 21:00-23:00 and increase connect/cache to for instance to 1 day or more. On reconnecting, maybe set the ''connect every'' to for instance 30 minutes. Maybe that will stop these initial 1 minute attempts. Not tried it, but will test it here on a 6.12 client pulling the Wifi dongle and report back. --//-- edit: Just set my connect to 0.02 days (about every 30 minutes) and increased cache and nothing happened until I hit the update button. Then this showed in log, with gradual increment. 1028 World Community Grid 14-09-2011 22:18 Scheduler request failed: Couldn't resolve host name 1029 World Community Grid 14-09-2011 22:18 [sched_op] Deferring communication for 1 min 36 sec 1030 World Community Grid 14-09-2011 22:18 [sched_op] Reason: Scheduler request failed 1031 14-09-2011 22:18 Project communication failed: attempting access to reference site 1032 14-09-2011 22:18 BOINC can't access Internet - check network connection or proxy configuration. 1064 World Community Grid 14-09-2011 22:20 [sched_op] Starting scheduler request 1065 World Community Grid 14-09-2011 22:20 Sending scheduler request: To fetch work. 1066 World Community Grid 14-09-2011 22:20 Requesting new tasks for CPU 1067 World Community Grid 14-09-2011 22:20 [sched_op] CPU work request: 37727.86 seconds; 0.00 CPUs 1068 World Community Grid 14-09-2011 22:20 Scheduler request failed: Couldn't resolve host name 1069 World Community Grid 14-09-2011 22:20 [sched_op] Deferring communication for 1 min 52 sec 1070 World Community Grid 14-09-2011 22:20 [sched_op] Reason: Scheduler request failed [Edit 1 times, last edit by Former Member at Sep 14, 2011 8:25:04 PM] |
||
|
|
EiF
Cruncher Joined: Nov 28, 2004 Post Count: 14 Status: Offline Project Badges:
|
Hi Sekerob,
thanks for quick response. I already had the cache set to 2 days, but connection to network "about every 1 day". I have set it now to 0.02 days, and as you said - nothing happens until I hit the update button or back-off timer runs out. I'll try to reproduce the same scenario with this new settings tomorrow. Regards EiF |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
What is maybe part of the neurotic behavior with "connect every 1.0 days" is that BOINC gets pushy when there are many result to upload [and knows it wont get a next chance until the next 24 hours have passed]. If up-loadable results exceeds 2x the number of cores on the computer the work fetch get blocked till the uploads have cleared as well. Think there is better scheduling in 6.12, certainly don't have any major complaints about the operation of the latest version.
----------------------------------------Let us know what you find... btw, what client version was that? --//-- edit: To confirm, you have the network set to 19:00-23:00 and the activity menu set to "network based on preferences". My reading of opening post is that is the case, but with the "connect every 1.0 days" not sure. [Edit 1 times, last edit by Former Member at Sep 14, 2011 8:57:24 PM] |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
The connection-rules depends on version used, but basically there's these different scenarios:
----------------------------------------1: Can't get any connection at all, connection to Google (or IBM) fails. In this scenario, BOINC will try re-connection fairly frequently, not sure if it's around once per minute or something, and AFAIK the "normal" backoff-rules doesn't apply. 2: Problems connecting to scheduling-server. This gives an exponential backoff. After 10 failed connections, the project's homepage is tried read, and if this also gives an error it's possible the 24-hour backoff is still in place (*), but haven't tested this resently. If does successfully manage to download the home-page, the failure-count is reset, and if next scheduler-request also fails it's back to low-end exponential backoff. 3: Problems connecting to upload-server or download-server. This gives exponential backoff, but #failed connections has no influence here, this is just for scheduling-server. With v6.10.xx you've now also got the project-wide backoffs, in case multiple errors in a row to upload or download. 4: With v6.6.xx and later, work-requests gets a respons from scheduling-server, but project doesn't give you any work. This is most often seen with DDDT2, even during the periods there it should be work available, the most likely messages is "there was work but it was committed to other platforms" or "no work available"... These failure to get work gives a "deferral interval" that is doubled for each continuous failure, but is reset back to zero if a running task finishes, or user manually hits "update". Doesn't matter for WCG at the moment, but just to mention it, in case you've also got GPU, you'll have separate "deferral interval" for each resource-type. 5:Appart for the client-side behaviour, there's also a possibility for some server-controlled deferrals. The most common is the deferral that controls how fast client is allowed to do next work-request, for WCG this is given as 11 seconds. Most others are 1-hour deferrals in case of "can't upen database" or similar. One thing I've not mentioned yet, and this is how large is the exponential deferral. This depends on client-version. With older clients, the exponential deferral starts at 1 minute and increases with a max of 4 hours, and it's fairly slow increase so in practice you'll often get multiple 1-minute-deferrals before finally starts to increase. So short deferrals was too frequently, so with v6.12.xx, exponential deferral was changed, and is now using the limits 10 minutes and 12 hours. For one thing the shortest deferral is 10x larger, and additionally the deferral increases much faster than before, so hitting 1 hour or something on 2nd. try is fairly normal. The deferrals for failure to get work has also been changed. With older clients this started at one minute, was doubled, upto a max of 24 hours. In v6.12.xx it starts on 10 minutes and is doubled, upto a max of 24 hours. But, while 10 minutes is the shortest deferral interval, the actual deferral will AFAIK be between 0.5x and 2x (**) the deferral interval, meaning the shortest is 5 minutes - 20 minutes, and so upward for each doubling, but if not mistaken you'll not exceed 24 hours. In all cases, either manually hitting "update" or selecting a backed-off upload or download and hitting "retry" will immediately re-try a transfer. (*): With really old clients, failure to download homepage gave a 1-week deferral... (**): Possible mis-remembers here, and it's 0.5x - 1.5x. In either case, the time choosen is within an intervall around the deferral intervall that gets doubled for each time client say's "there was work but it was committed to other platforms"... ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
EiF
Cruncher Joined: Nov 28, 2004 Post Count: 14 Status: Offline Project Badges:
|
Hi,
I am using BOINC version 6.10.58 and I have the setting "network based on preferences". I changed the preferences to "connect about every 0.02 days" and the resulting behaviour slightly changed: Now the retry intervals were: 1min - 1min - 1min - 1min - 1min - 7min - 5min - 18min - 1h42min - 1h27min - 3h55min - 2h58min. Previously, with the "connect about every 1 day" setting, the intervals were 1 min - 1min - 1min - 1min - 1min - 7min - 7min - something less than 2 hours - and then 3h40min. It is not much different, it seems the scheduling indeed depends on other variables too, but at least the intervals now grow slightly slower. |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
Hi, I am using BOINC version 6.10.58 and I have the setting "network based on preferences". I changed the preferences to "connect about every 0.02 days" and the resulting behaviour slightly changed: Now the retry intervals were: 1min - 1min - 1min - 1min - 1min - 7min - 5min - 18min - 1h42min - 1h27min - 3h55min - 2h58min. Previously, with the "connect about every 1 day" setting, the intervals were 1 min - 1min - 1min - 1min - 1min - 7min - 7min - something less than 2 hours - and then 3h40min. It is not much different, it seems the scheduling indeed depends on other variables too, but at least the intervals now grow slightly slower. Hmm, let's see, exponential backoff in BOINC is... Ah, for v6.10.58, it was basically... random_number_between (60 seconds and e^N), and this in practice means: N - backoff-intervall (minutes) 1 - 1 2 - 1 3 - 1 4 - 1 5 - 1 - 2.47 6 - 1 - 6.72 7 - 1 - 18.28 8 - 1 - 49.68 9 - 1 - 135.05 10 - 1 - 240 So, after 10 tries, you'll get a random backoff between 1 minute and 4 hours with v6.10.xx. Also, there's always 4 tries initially with only 1 minute between retries, something that's much too frequent in case of server-problems. As a comparison, with v6.12.xx the code is basically this: 2^N * (0.5 + 0.5 * random_number) N is retry-count, while random_number is a number between zero and 1. Additionally the min of 10 minutes and the max of 12 hours comes into place. This means you'll have: N - backoff-intervall 1 - 10 - 20 minutes 2 - 20 - 40 minutes 3 - 40 minutes - 1.33 hours 4 - 1.33 - 2.67 hours 5 - 2.67 - 5.33 hours 6 - 5.33 - 10.67 hours 7 - 6 hours - 12 hours. So, after 7 attempts, you'll get a backoff between 6 hours and 12 hours, if you're running v6.12.xx. In either case, the "Connect about every ... days" has no effect on the backoff. But, caching-behaviour is influenced by this, and with your computer basically being offline for close to 24 hours per day, I'll recommend you're using minimum 1 day for this setting. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." [Edit 1 times, last edit by Ingleside at Sep 16, 2011 12:05:30 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Problem is, if you set "connect every" to 1 day, his variable window of *actual* connect time may cause the client to not connect at all. I'll stick to the small/zero connect interval and "additional buffer" setting modus operandi as I've been running for a very long time. As for *theory* on 6.12.33, practical observation I've posted before in this thread... small initial, to the second exact increments.
----------------------------------------So, have you got a fix? I think I have in form of a simple boinccmd tool script that runs off the OS scheduler. Ludicrous we have to go through these gyrations to be able and run set and forget... in how to make maximized volunteer computing really hard. --//-- edit: boinccmd sequence would be roughly: - Set BOINC to allow network to always - Force a project update - Set a script sleep for 30 minutes - Set BOINC to suspend network. [Edit 1 times, last edit by Former Member at Sep 16, 2011 12:23:04 PM] |
||
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
Problem is, if you set "connect every" to 1 day, his variable window of *actual* connect time may cause the client to not connect at all. Hmm, and why shouldn't the client connect... I'll stick to the small/zero connect interval and "additional buffer" setting modus operandi as I've been running for a very long time. As for *theory* on 6.12.33, practical observation I've posted before in this thread... small initial, to the second exact increments. Well, the last part atleast is very easy to answer, since your log shows... "BOINC can't access Internet " So, based on my rule #1 in my earlier post in the thread, whatever "deferred" doesn't follow the normal rules, and is no good indication of behaviour then internet is accessible. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"BOINC can't access Internet "
Which is the root of the problem, for as long as the condition persists. Eif/we're looking a workaround, optimal settings, so BOINC won't get stopped from doing the work report and fetch without hitting that 3-4 and more hours auto-backoff when the line is actually up and NOT having micromanage. --//-- |
||
|
|
|