Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 20
Posts: 20   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 107234 times and has 19 replies Next Thread
launila@gmail.com
Cruncher
Joined: Aug 8, 2006
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Computation error when missing network connection

Hi

My friend is having next kind of problem with BOINC under Linux. We have already searched solutions from Internet, includig this forum but haven't found exactly same problem and solution for it.

If there is no working network connection when task finishes then task shows computation error. That problem does not exist under Windows. He is running BOINC on many computers and on every of them this problem has existed about two years. Also different Linux distributions are having that problem.

If result is marked as ok before losing network connection then there is no problem if retry button is not pressed under transfers tab. If there is working network connection all the time then there is no problems.

I remember that old BOINC versions was possible to run without network connection and reported all tasks fine when network connection was plugged in. In Windows that is already working very well.

This problem has existed so long that we are thinking is there any fix coming or are default BOINC settings wrong on all current Linux BOINC versions?

Reasons why it is not possible to keep good network connection always is that mobile Internet connections may have problems sometimes. If there is long file transfer running then it may also take long time to open new connections under Linux and BOINC cannot upload jobs immediately. Also computers are used sometimes few days on places where is no Internet.

Here is some log rows:
Sat 20 Aug 2011 07:34:15 AM EEST        World Community Grid        Computation for task X0000123330305201010041110_0 finished
Sat 20 Aug 2011 07:34:15 AM EEST World Community Grid Output file X0000123330305201010041110_0_0 for task X0000123330305201010041110_0 absent
Sat 20 Aug 2011 07:34:15 AM EEST World Community Grid Starting X0000123330301201010041111_0
Sat 20 Aug 2011 07:34:15 AM EEST World Community Grid Starting task X0000123330301201010041111_0 using hcc1 version 640
Sat 20 Aug 2011 07:34:15 AM EEST World Community Grid Scheduler request failed: Couldn't resolve host name
Sat 20 Aug 2011 07:34:16 AM EEST World Community Grid Computation for task X0000123330302201010041111_0 finished
Sat 20 Aug 2011 07:34:16 AM EEST World Community Grid Output file X0000123330302201010041111_0_0 for task X0000123330302201010041111_0 absent
Sat 20 Aug 2011 07:34:16 AM EEST World Community Grid Starting X0000123330295201010041111_0
Sat 20 Aug 2011 07:34:16 AM EEST World Community Grid Starting task X0000123330295201010041111_0 using hcc1 version 640
Sat 20 Aug 2011 07:34:45 AM EEST Project communication failed: attempting access to reference site
Sat 20 Aug 2011 07:36:06 AM EEST BOINC can't access Internet - check network connection or proxy configuration.
Sat 20 Aug 2011 07:36:06 AM EEST World Community Grid Computation for task X0000123330301201010041111_0 finished
Sat 20 Aug 2011 07:36:06 AM EEST World Community Grid Output file X0000123330301201010041111_0_0 for task X0000123330301201010041111_0 absent
Sat 20 Aug 2011 07:36:06 AM EEST World Community Grid Starting X0000123330285201010041111_0
Sat 20 Aug 2011 07:36:06 AM EEST World Community Grid Starting task X0000123330285201010041111_0 using hcc1 version 640
Sat 20 Aug 2011 07:36:08 AM EEST World Community Grid Computation for task X0000123330295201010041111_0 finished
Sat 20 Aug 2011 07:36:08 AM EEST World Community Grid Output file X0000123330295201010041111_0_0 for task X0000123330295201010041111_0 absent
Sat 20 Aug 2011 07:36:08 AM EEST World Community Grid Starting X0000123330284201010041111_0
Sat 20 Aug 2011 07:36:08 AM EEST World Community Grid Starting task X0000123330284201010041111_0 using hcc1 version 640
Sat 20 Aug 2011 07:37:12 AM EEST World Community Grid Sending scheduler request: To fetch work.
Sat 20 Aug 2011 07:37:12 AM EEST World Community Grid Reporting 4 completed tasks, requesting new tasks
Sat 20 Aug 2011 07:38:33 AM EEST World Community Grid Computation for task X0000123330285201010041111_0 finished
Sat 20 Aug 2011 07:38:33 AM EEST World Community Grid Output file X0000123330285201010041111_0_0 for task X0000123330285201010041111_0 absent
Sat 20 Aug 2011 07:38:33 AM EEST World Community Grid Starting X0000123330280201010041111_0
Sat 20 Aug 2011 07:38:33 AM EEST World Community Grid Starting task X0000123330280201010041111_0 using hcc1 version 640
Sat 20 Aug 2011 07:38:33 AM EEST World Community Grid Scheduler request failed: Couldn't resolve host name
Sat 20 Aug 2011 07:38:34 AM EEST World Community Grid Computation for task X0000123330284201010041111_0 finished
Sat 20 Aug 2011 07:38:34 AM EEST World Community Grid Output file X0000123330284201010041111_0_0 for task X0000123330284201010041111_0 absent
Sat 20 Aug 2011 07:38:34 AM EEST World Community Grid Starting X0000123330271201010041111_1
Sat 20 Aug 2011 07:38:34 AM EEST World Community Grid Starting task X0000123330271201010041111_1 using hcc1 version 640
Sat 20 Aug 2011 07:39:39 AM EEST World Community Grid Sending scheduler request: To fetch work.
Sat 20 Aug 2011 07:39:39 AM EEST World Community Grid Reporting 6 completed tasks, requesting new tasks
Sat 20 Aug 2011 07:41:00 AM EEST World Community Grid Computation for task X0000123330280201010041111_0 finished
Sat 20 Aug 2011 07:41:00 AM EEST World Community Grid Output file X0000123330280201010041111_0_0 for task X0000123330280201010041111_0 absent
Sat 20 Aug 2011 07:41:00 AM EEST World Community Grid Starting X0000123330265201010041111_0
Sat 20 Aug 2011 07:41:00 AM EEST World Community Grid Starting task X0000123330265201010041111_0 using hcc1 version 640
Sat 20 Aug 2011 07:41:00 AM EEST World Community Grid Scheduler request failed: Couldn't resolve host name
Sat 20 Aug 2011 07:41:01 AM EEST World Community Grid Computation for task X0000123330271201010041111_1 finished

[Aug 20, 2011 7:18:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

Hi,

Not looked hard enough as I've brought up the issue multiple times ;D

WIFI by any chance? Yes, I'm meticulous in setting BOINC network off-line on my Linux box and connect on schedule for 30 minutes towards the end of the UTC day, so it can upload and fetch. Am doing brief manual connects if I need to be at that box, then forget to network suspend in BOINC and have then at times a series from 3 to 5 jobs fails when a task finishes. The problem has persisted through v 6.12.33. Lost me about 24 hours of computing time this week at 3:05AM per the logs. In the case of uploading / fetching with the unstable mobile connection, you'd like to suspend computing during that phase. Not done that here, but that half hour of networking I could schedule to not compute. 95% of the time things go fine [out of luck once every 3-4 weeks], so I'll let it ride, since 4 cores, half hour suspend is 2 hours computing time :D

Also have set WIFI on Linux not do power manage with "sudo iwconfig wlan0 power off" command [wlan0 can also be wlan1, wlan2 depending how many WIFI connects were ever configured]. Power management is broken on Linux since about 10.04-10.10, but supposed to be fixed with next 11.10. We'll see and then believe. The power off ensures maximum transmission speed.

--//--
[Aug 20, 2011 7:53:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
finrabbit
Cruncher
Joined: Aug 8, 2006
Post Count: 1
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

Thanks mr.larry for initiating the discussion and thanks Sekerob for answering! Actually mr.larry has been our Team Captain of CrunchyElephants for the last 5 years. We have burned 100 MWh in WCG-computing and fortunately some of that has assisted in wintertime heat production. We have lost at least 10 million WCG-points due to network-related Errors, breakage of ongoing WU's or collapse of subsequent initiated WU's during network disconnect. To be more specific. If one pulls the Ethernet cord from the back of the PC-port, no calamities happen, but if service disrupts via WiFi or ADSL or HomePNA, with all interconnects attached, then it breaks the ongoing WU's and the waiting queue often within 2 minutes of job init, but also strings of more than 50% finished jobs.... Cheers? "finrabbit"
[Aug 20, 2011 9:06:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

Scheduler request failed: Couldn't resolve host name

Boinc sometimes doesn't deal well with DNS issues.

A couple years ago, my service provider had problems with their DNS servers and my linux machine would receive the same error and cause tasks to error out.

I entered the WCG's IP addresses into the hosts file and after that the error changed to http connect errors but did not cause the tasks to error out.

http://www.worldcommunitygrid.org/forums/wcg/...ead,25668_offset,0#230278
[Aug 20, 2011 3:00:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

That's an interesting note, BobCat13. WIFI wobble I'd always thought and being on classic Ethernet wire gives unfailing stability (not an option here), the router programmed to always reserve and give the same IP to the hosts. Have WCG set up in hosts[file] on the Windows instances, but can't remember having done that for the Linux. Will then also add the new download servers and then hope it will go away forever.

thx.

--//--
[Aug 20, 2011 3:13:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

I had guessed this was the cause of my recent spate of errors and am glad to see a thread confirming the cause. Just finished installing my latest rig and I had run out of network ports. I figured wireless would be quick and painless but between my lack of knowledge of Linux and the wireless continually losing connection, I just chained another router in and haven't had any errors since.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Aug 20, 2011 3:18:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

Try the "power off" command I gave above in a terminal window. It works wonders, 99.9% of the time. Always see 270Mb sensed speed in the panel monitor icon.

--//--

edit: That's 802.11N WIFI
----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 20, 2011 3:26:31 PM]
[Aug 20, 2011 3:25:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7851
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

I have two machines running Linux which are only intermittently connected to the internet via a wireless connection. I leave "network activity suspended" when they are disconnected from the internet. The processing keeps running and the results to be transmitted just pile up up under the transfer tab. Once I reconnect to the internet, I have to re-enable the network connection in Linux, then, under the activity tab in BOINC manager change the setting to "Network activity always available." The transfers are uploaded and BOINC manager will call for new tasks. With most projects, both the uploads and downloads are fairly small so they do not take long to do 20 to 40 jobs. The exception is CEP2 because the uploads are greater than 20mb. So I do not do CEP2 on the machines which only get connected to the internet intermittently. Once I am done I change back to "network activity suspended." I use Linux Mint - a Ubuntu variant. This also works for 3 windows machines. The total time runs about 15 minutes total because I learned the hard way the wireless pipeline will only do 2 machines simultaneously. When I try to do 3 or more at the same time, the connection freezes up. Trying to push too much information down the pipe all at the same time I suppose. Hope this helps.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Aug 20, 2011 7:22:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

That's an interesting note, BobCat13. WIFI wobble I'd always thought and being on classic Ethernet wire gives unfailing stability (not an option here), the router programmed to always reserve and give the same IP to the hosts. Have WCG set up in hosts[file] on the Windows instances, but can't remember having done that for the Linux. Will then also add the new download servers and then hope it will go away forever.

thx.

--//--

OK, so found I'd never added the WCG IPs to the hosts file under Ubuntu. With some digging of the names to go with the new download servers I get:

198.20.8.241 grid.worldcommunitygrid.org
198.20.8.246 www.worldcommunitygrid.org
198.20.8.246 secure.worldcommunitygrid.org
198.20.8.246 www.wcgrid.com
170.224.160.205 download.worldcommunitygrid.org
170.224.194.69 download1.worldcommunitygrid.org
170.225.97.195 download2.worldcommunitygrid.org

From lookup it appears that the last 3 domain names are associated with all 3 IP addresses. Cloud effect maybe? Now it's twiddling thumbs to see if these group result fails do not reappear.

--//--

NB: Names/IPs wholly shared *as-is*. If in error, please post.

edit: correct 3rd last domain name.
----------------------------------------
[Edit 1 times, last edit by Former Member at Aug 23, 2011 2:58:44 PM]
[Aug 21, 2011 4:09:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Computation error when missing network connection

Sadly, Dickens could not care about hosts... had left the client on-line and WIFI went but only on the Linuxquad and all 4 jobs bombed with signal 11. Another 14 hours tubed. Lesson taken. Crunch off-line.

--//--
[Aug 22, 2011 7:54:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 20   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread