World Community Grid - View Thread - Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 13

[ ]

Author

This topic has been viewed 2364 times and has 12 replies

EiF
Cruncher
Joined: Nov 28, 2004
Post Count: 14
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

90 day badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

90 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

90 day badge for Uncovering Genome Mysteries

90 day badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

90 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

90 day badge for OpenPandemics - COVID-19


Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

My BOINC is set up to use network connectivity daily between 19:00 and 23:00, when I am at home. All works fine if I come home before 7pm: BOINC buffers completed tasks and uploads them at 19:00.

Today the laptop was running without internet until 19:37. BOINC made couple of attempts to upload during this time. Then I put the laptop to standby and resumed it at home at 20:21. BOINC attempted to connect *immediatelly* at the system resume. Obviously it did not succeed as it took a few seconds for the network to become available (wifi authentication, DHCP, etc. you know ...). But that was too late for BOINC and BOINC delayed next attempt by more than 3 hours:

14.9.2011 19:00:00 Resuming network activity
14.9.2011 19:00:01 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:00:02 Project communication failed: attempting access to reference site
14.9.2011 19:00:02 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:00:02 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:00:04 BOINC can't access Internet - check network connection or proxy configuration.
14.9.2011 19:01:03 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:01:04 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:01:04 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:02:04 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:02:05 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:02:05 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:03:06 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:03:07 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:03:07 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:04:07 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:04:08 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:04:08 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:05:43 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:05:44 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:05:44 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:07:16 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:07:18 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:07:18 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:15:46 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:15:47 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:15:47 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:22:15 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:22:16 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 19:22:16 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 19:37:36 Windows is suspending operations
14.9.2011 19:37:37 Suspending network activity - user request
14.9.2011 20:21:26 Windows is resuming operations
14.9.2011 20:21:27 Resuming network activity
14.9.2011 20:21:27 World Community Grid Started upload of DSFL_00000010_0000032_0393_0_0
14.9.2011 20:21:29 World Community Grid Temporarily failed upload of DSFL_00000010_0000032_0393_0_0: can't resolve hostname
14.9.2011 20:21:29 World Community Grid Backing off 1 min 0 sec on upload of DSFL_00000010_0000032_0393_0_0

At 21:07:00 this was still the last message in the message log. The completed task was in the list of transfers, "ready to upload" with "project delay" set to 2 hours and 54 minutes. Why so much?

Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby?
If this attempt fails, why BOINC waits 3 hours before trying for the second time?

Thanks in advance for help.

[Sep 14, 2011 7:41:00 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

Hi,

Had not realized that the network back-off would rise so quickly to 3 hours, for sure, I've always found it making no sense to run the series of 1 minute intervals when a connection fails, particularly if the work in progress / cached far exceeds that value. Our house expert, Ingleside, may be able to tell what the logic is of the mechanical logic... does not strike me as AI. When this happens on my Linux box and the WIFI fails the running tasks start failing too, so I'm actually running with only a 30 minute window and manual connects when at the machine and know the line is up.

For the moment, a workaround may be to narrow the window further, to 21:00-23:00 and increase connect/cache to for instance to 1 day or more.

On reconnecting, maybe set the ''connect every'' to for instance 30 minutes. Maybe that will stop these initial 1 minute attempts. Not tried it, but will test it here on a 6.12 client pulling the Wifi dongle and report back.

--//--

edit: Just set my connect to 0.02 days (about every 30 minutes) and increased cache and nothing happened until I hit the update button. Then this showed in log, with gradual increment.

1028 World Community Grid 14-09-2011 22:18 Scheduler request failed: Couldn't resolve host name
1029 World Community Grid 14-09-2011 22:18 [sched_op] Deferring communication for 1 min 36 sec
1030 World Community Grid 14-09-2011 22:18 [sched_op] Reason: Scheduler request failed
1031 14-09-2011 22:18 Project communication failed: attempting access to reference site
1032 14-09-2011 22:18 BOINC can't access Internet - check network connection or proxy configuration.
1064 World Community Grid 14-09-2011 22:20 [sched_op] Starting scheduler request
1065 World Community Grid 14-09-2011 22:20 Sending scheduler request: To fetch work.
1066 World Community Grid 14-09-2011 22:20 Requesting new tasks for CPU
1067 World Community Grid 14-09-2011 22:20 [sched_op] CPU work request: 37727.86 seconds; 0.00 CPUs
1068 World Community Grid 14-09-2011 22:20 Scheduler request failed: Couldn't resolve host name
1069 World Community Grid 14-09-2011 22:20 [sched_op] Deferring communication for 1 min 52 sec
1070 World Community Grid 14-09-2011 22:20 [sched_op] Reason: Scheduler request failed

----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 14, 2011 8:25:04 PM]

[Sep 14, 2011 8:16:07 PM]

EiF
Cruncher
Joined: Nov 28, 2004
Post Count: 14
Status: Offline
Project Badges:


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

Hi Sekerob,

thanks for quick response. I already had the cache set to 2 days, but connection to network "about every 1 day".

I have set it now to 0.02 days, and as you said - nothing happens until I hit the update button or back-off timer runs out.

I'll try to reproduce the same scenario with this new settings tomorrow.

Regards
EiF

[Sep 14, 2011 8:44:17 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

What is maybe part of the neurotic behavior with "connect every 1.0 days" is that BOINC gets pushy when there are many result to upload [and knows it wont get a next chance until the next 24 hours have passed]. If up-loadable results exceeds 2x the number of cores on the computer the work fetch get blocked till the uploads have cleared as well. Think there is better scheduling in 6.12, certainly don't have any major complaints about the operation of the latest version.

Let us know what you find... btw, what client version was that?

--//--

edit: To confirm, you have the network set to 19:00-23:00 and the activity menu set to "network based on preferences". My reading of opening post is that is the case, but with the "connect every 1.0 days" not sure.

----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 14, 2011 8:57:24 PM]

[Sep 14, 2011 8:52:58 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

The connection-rules depends on version used, but basically there's these different scenarios:
1: Can't get any connection at all, connection to Google (or IBM) fails. In this scenario, BOINC will try re-connection fairly frequently, not sure if it's around once per minute or something, and AFAIK the "normal" backoff-rules doesn't apply.

2: Problems connecting to scheduling-server. This gives an exponential backoff. After 10 failed connections, the project's homepage is tried read, and if this also gives an error it's possible the 24-hour backoff is still in place (*), but haven't tested this resently. If does successfully manage to download the home-page, the failure-count is reset, and if next scheduler-request also fails it's back to low-end exponential backoff.

3: Problems connecting to upload-server or download-server. This gives exponential backoff, but #failed connections has no influence here, this is just for scheduling-server. With v6.10.xx you've now also got the project-wide backoffs, in case multiple errors in a row to upload or download.

4: With v6.6.xx and later, work-requests gets a respons from scheduling-server, but project doesn't give you any work. This is most often seen with DDDT2, even during the periods there it should be work available, the most likely messages is "there was work but it was committed to other platforms" or "no work available"... These failure to get work gives a "deferral interval" that is doubled for each continuous failure, but is reset back to zero if a running task finishes, or user manually hits "update". Doesn't matter for WCG at the moment, but just to mention it, in case you've also got GPU, you'll have separate "deferral interval" for each resource-type.

5:Appart for the client-side behaviour, there's also a possibility for some server-controlled deferrals. The most common is the deferral that controls how fast client is allowed to do next work-request, for WCG this is given as 11 seconds. Most others are 1-hour deferrals in case of "can't upen database" or similar.

One thing I've not mentioned yet, and this is how large is the exponential deferral. This depends on client-version. With older clients, the exponential deferral starts at 1 minute and increases with a max of 4 hours, and it's fairly slow increase so in practice you'll often get multiple 1-minute-deferrals before finally starts to increase.

So short deferrals was too frequently, so with v6.12.xx, exponential deferral was changed, and is now using the limits 10 minutes and 12 hours. For one thing the shortest deferral is 10x larger, and additionally the deferral increases much faster than before, so hitting 1 hour or something on 2nd. try is fairly normal.

The deferrals for failure to get work has also been changed. With older clients this started at one minute, was doubled, upto a max of 24 hours. In v6.12.xx it starts on 10 minutes and is doubled, upto a max of 24 hours. But, while 10 minutes is the shortest deferral interval, the actual deferral will AFAIK be between 0.5x and 2x (**) the deferral interval, meaning the shortest is 5 minutes - 20 minutes, and so upward for each doubling, but if not mistaken you'll not exceed 24 hours.

In all cases, either manually hitting "update" or selecting a backed-off upload or download and hitting "retry" will immediately re-try a transfer.

(*): With really old clients, failure to download homepage gave a 1-week deferral...
(**): Possible mis-remembers here, and it's 0.5x - 1.5x. In either case, the time choosen is within an intervall around the deferral intervall that gets doubled for each time client say's "there was work but it was committed to other platforms"...

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Sep 14, 2011 9:38:43 PM]

EiF
Cruncher
Joined: Nov 28, 2004
Post Count: 14
Status: Offline
Project Badges:


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

Hi,
I am using BOINC version 6.10.58 and I have the setting "network based on preferences". I changed the preferences to "connect about every 0.02 days" and the resulting behaviour slightly changed:

Now the retry intervals were: 1min - 1min - 1min - 1min - 1min - 7min - 5min - 18min - 1h42min - 1h27min - 3h55min - 2h58min.

Previously, with the "connect about every 1 day" setting, the intervals were 1 min - 1min - 1min - 1min - 1min - 7min - 7min - something less than 2 hours - and then 3h40min.

It is not much different, it seems the scheduling indeed depends on other variables too, but at least the intervals now grow slightly slower.

[Sep 16, 2011 8:08:52 AM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

Hmm, let's see, exponential backoff in BOINC is...

Ah, for v6.10.58, it was basically...
random_number_between (60 seconds and e^N), and this in practice means:

N - backoff-intervall (minutes)
1 - 1
2 - 1
3 - 1
4 - 1
5 - 1 - 2.47
6 - 1 - 6.72
7 - 1 - 18.28
8 - 1 - 49.68
9 - 1 - 135.05
10 - 1 - 240

So, after 10 tries, you'll get a random backoff between 1 minute and 4 hours with v6.10.xx. Also, there's always 4 tries initially with only 1 minute between retries, something that's much too frequent in case of server-problems.

As a comparison, with v6.12.xx the code is basically this:
2^N * (0.5 + 0.5 * random_number)

N is retry-count, while random_number is a number between zero and 1. Additionally the min of 10 minutes and the max of 12 hours comes into place.

This means you'll have:
N - backoff-intervall
1 - 10 - 20 minutes
2 - 20 - 40 minutes
3 - 40 minutes - 1.33 hours
4 - 1.33 - 2.67 hours
5 - 2.67 - 5.33 hours
6 - 5.33 - 10.67 hours
7 - 6 hours - 12 hours.

So, after 7 attempts, you'll get a backoff between 6 hours and 12 hours, if you're running v6.12.xx.

In either case, the "Connect about every ... days" has no effect on the backoff. But, caching-behaviour is influenced by this, and with your computer basically being offline for close to 24 hours per day, I'll recommend you're using minimum 1 day for this setting.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

----------------------------------------
[Edit 1 times, last edit by Ingleside at Sep 16, 2011 12:05:30 PM]

[Sep 16, 2011 12:03:26 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

Problem is, if you set "connect every" to 1 day, his variable window of *actual* connect time may cause the client to not connect at all. I'll stick to the small/zero connect interval and "additional buffer" setting modus operandi as I've been running for a very long time. As for *theory* on 6.12.33, practical observation I've posted before in this thread... small initial, to the second exact increments.

So, have you got a fix? I think I have in form of a simple boinccmd tool script that runs off the OS scheduler. Ludicrous we have to go through these gyrations to be able and run set and forget... in how to make maximized volunteer computing really hard.

--//--

edit: boinccmd sequence would be roughly:

- Set BOINC to allow network to always
- Force a project update
- Set a script sleep for 30 minutes
- Set BOINC to suspend network.

----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 16, 2011 12:23:04 PM]

[Sep 16, 2011 12:17:27 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

Problem is, if you set "connect every" to 1 day, his variable window of *actual* connect time may cause the client to not connect at all.

Hmm, and why shouldn't the client connect... confused

I'll stick to the small/zero connect interval and "additional buffer" setting modus operandi as I've been running for a very long time. As for *theory* on 6.12.33, practical observation I've posted before in this thread... small initial, to the second exact increments.

Well, the last part atleast is very easy to answer, since your log shows...
"BOINC can't access Internet "

So, based on my rule #1 in my earlier post in the thread, whatever "deferred" doesn't follow the normal rules, and is no good indication of behaviour then internet is accessible.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Sep 16, 2011 1:03:26 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Why BOINC attempts to connect to network IMMEDIATELY after system resume from standby and does not try again?

"BOINC can't access Internet "

Which is the root of the problem, for as long as the condition persists. Eif/we're looking a workaround, optimal settings, so BOINC won't get stopped from doing the work report and fetch without hitting that 3-4 and more hours auto-backoff when the line is actually up and NOT having micromanage.

--//--

[Sep 16, 2011 1:08:55 PM]

[ ]