| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 20
|
|
| Author |
|
|
launila@gmail.com
Cruncher Joined: Aug 8, 2006 Post Count: 7 Status: Offline Project Badges:
|
Hi
My friend is having next kind of problem with BOINC under Linux. We have already searched solutions from Internet, includig this forum but haven't found exactly same problem and solution for it. If there is no working network connection when task finishes then task shows computation error. That problem does not exist under Windows. He is running BOINC on many computers and on every of them this problem has existed about two years. Also different Linux distributions are having that problem. If result is marked as ok before losing network connection then there is no problem if retry button is not pressed under transfers tab. If there is working network connection all the time then there is no problems. I remember that old BOINC versions was possible to run without network connection and reported all tasks fine when network connection was plugged in. In Windows that is already working very well. This problem has existed so long that we are thinking is there any fix coming or are default BOINC settings wrong on all current Linux BOINC versions? Reasons why it is not possible to keep good network connection always is that mobile Internet connections may have problems sometimes. If there is long file transfer running then it may also take long time to open new connections under Linux and BOINC cannot upload jobs immediately. Also computers are used sometimes few days on places where is no Internet. Here is some log rows: Sat 20 Aug 2011 07:34:15 AM EEST World Community Grid Computation for task X0000123330305201010041110_0 finished |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
Not looked hard enough as I've brought up the issue multiple times ;D WIFI by any chance? Yes, I'm meticulous in setting BOINC network off-line on my Linux box and connect on schedule for 30 minutes towards the end of the UTC day, so it can upload and fetch. Am doing brief manual connects if I need to be at that box, then forget to network suspend in BOINC and have then at times a series from 3 to 5 jobs fails when a task finishes. The problem has persisted through v 6.12.33. Lost me about 24 hours of computing time this week at 3:05AM per the logs. In the case of uploading / fetching with the unstable mobile connection, you'd like to suspend computing during that phase. Not done that here, but that half hour of networking I could schedule to not compute. 95% of the time things go fine [out of luck once every 3-4 weeks], so I'll let it ride, since 4 cores, half hour suspend is 2 hours computing time :D Also have set WIFI on Linux not do power manage with "sudo iwconfig wlan0 power off" command [wlan0 can also be wlan1, wlan2 depending how many WIFI connects were ever configured]. Power management is broken on Linux since about 10.04-10.10, but supposed to be fixed with next 11.10. We'll see and then believe. The power off ensures maximum transmission speed. --//-- |
||
|
|
finrabbit
Cruncher Joined: Aug 8, 2006 Post Count: 1 Status: Offline Project Badges:
|
Thanks mr.larry for initiating the discussion and thanks Sekerob for answering! Actually mr.larry has been our Team Captain of CrunchyElephants for the last 5 years. We have burned 100 MWh in WCG-computing and fortunately some of that has assisted in wintertime heat production. We have lost at least 10 million WCG-points due to network-related Errors, breakage of ongoing WU's or collapse of subsequent initiated WU's during network disconnect. To be more specific. If one pulls the Ethernet cord from the back of the PC-port, no calamities happen, but if service disrupts via WiFi or ADSL or HomePNA, with all interconnects attached, then it breaks the ongoing WU's and the waiting queue often within 2 minutes of job init, but also strings of more than 50% finished jobs.... Cheers? "finrabbit"
|
||
|
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges:
|
Scheduler request failed: Couldn't resolve host name Boinc sometimes doesn't deal well with DNS issues. A couple years ago, my service provider had problems with their DNS servers and my linux machine would receive the same error and cause tasks to error out. I entered the WCG's IP addresses into the hosts file and after that the error changed to http connect errors but did not cause the tasks to error out. http://www.worldcommunitygrid.org/forums/wcg/...ead,25668_offset,0#230278 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That's an interesting note, BobCat13. WIFI wobble I'd always thought and being on classic Ethernet wire gives unfailing stability (not an option here), the router programmed to always reserve and give the same IP to the hosts. Have WCG set up in hosts[file] on the Windows instances, but can't remember having done that for the Linux. Will then also add the new download servers and then hope it will go away forever.
thx. --//-- |
||
|
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
I had guessed this was the cause of my recent spate of errors and am glad to see a thread confirming the cause. Just finished installing my latest rig and I had run out of network ports. I figured wireless would be quick and painless but between my lack of knowledge of Linux and the wireless continually losing connection, I just chained another router in and haven't had any errors since.
----------------------------------------![]() Distributed computing volunteer since September 27, 2000 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Try the "power off" command I gave above in a terminal window. It works wonders, 99.9% of the time. Always see 270Mb sensed speed in the panel monitor icon.
------------------------------------------//-- edit: That's 802.11N WIFI [Edit 1 times, last edit by Former Member at Aug 20, 2011 3:26:31 PM] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7851 Status: Offline Project Badges:
|
I have two machines running Linux which are only intermittently connected to the internet via a wireless connection. I leave "network activity suspended" when they are disconnected from the internet. The processing keeps running and the results to be transmitted just pile up up under the transfer tab. Once I reconnect to the internet, I have to re-enable the network connection in Linux, then, under the activity tab in BOINC manager change the setting to "Network activity always available." The transfers are uploaded and BOINC manager will call for new tasks. With most projects, both the uploads and downloads are fairly small so they do not take long to do 20 to 40 jobs. The exception is CEP2 because the uploads are greater than 20mb. So I do not do CEP2 on the machines which only get connected to the internet intermittently. Once I am done I change back to "network activity suspended." I use Linux Mint - a Ubuntu variant. This also works for 3 windows machines. The total time runs about 15 minutes total because I learned the hard way the wireless pipeline will only do 2 machines simultaneously. When I try to do 3 or more at the same time, the connection freezes up. Trying to push too much information down the pipe all at the same time I suppose. Hope this helps.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That's an interesting note, BobCat13. WIFI wobble I'd always thought and being on classic Ethernet wire gives unfailing stability (not an option here), the router programmed to always reserve and give the same IP to the hosts. Have WCG set up in hosts[file] on the Windows instances, but can't remember having done that for the Linux. Will then also add the new download servers and then hope it will go away forever. thx. --//-- OK, so found I'd never added the WCG IPs to the hosts file under Ubuntu. With some digging of the names to go with the new download servers I get: 198.20.8.241 grid.worldcommunitygrid.org 198.20.8.246 www.worldcommunitygrid.org 198.20.8.246 secure.worldcommunitygrid.org 198.20.8.246 www.wcgrid.com 170.224.160.205 download.worldcommunitygrid.org 170.224.194.69 download1.worldcommunitygrid.org 170.225.97.195 download2.worldcommunitygrid.org From lookup it appears that the last 3 domain names are associated with all 3 IP addresses. Cloud effect maybe? Now it's twiddling thumbs to see if these group result fails do not reappear. --//-- NB: Names/IPs wholly shared *as-is*. If in error, please post. edit: correct 3rd last domain name. [Edit 1 times, last edit by Former Member at Aug 23, 2011 2:58:44 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sadly, Dickens could not care about hosts... had left the client on-line and WIFI went but only on the Linuxquad and all 4 jobs bombed with signal 11. Another 14 hours tubed. Lesson taken. Crunch off-line.
--//-- |
||
|
|
|