| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 8
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have several Linux BOINC agents running, all behind some wonderfully difficult-to-work-with firewalls between them and WCG. Often the authentication goes away, stopping traffic. When I see this, I re-authenticate so the results can be pushed and new work ensues. This works well on the Windows (non-BOINC) servers.
But I notice BOINC will start up and then say something to the effect that it will talk to WCG in several days, even though work is completed. I have tried using the send results immediately flag but that doesn't seem to have any effect. At some random time it will decide to send results - or retrieve new work - and proceed. But if it takes too long, the firewall closes the door and it fails to talk, so it delays things tremendously - I do not get much time to devote to this troubleshooting. I now appear to have 3 devices not working since November 23 and fear the work unit will be lost due to too much time having lapsed. Most (if not all) of my Linux boxes do not have graphic capability to run the manager to see what is going on. I do everything on the command line. I tried loading the Windows boincmgr to talk to the Linux agents but I need to change the setup to allow a remote manager to interract. To make matters worse, they are changing the firewalls as I type this to force authentication only through a web browser, which will probably kill my wget method of authentication. How can I jumpstart a Linux BOINC agent to send results or get new work? I'll sign this: Screaming towards the Top 100 with the parking brake on |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello hechlerg,
Look at the 'Running the Linux Agent' page: http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=linagent Hope this helps, mycrofth |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
So it sounds like I need to get the boinc manager running on the Windows box talking to the Linux agents so I can manage them. I'll see what I can do when I get in.
I should talk to Berkeley to see how this can be effectively managed on command line without any GUI. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My Windows BOINC manager will not talk to my Linux agent. I thought maybe because it was trying to authenticate as "Administrator" so I created an admin named "root" but it still says invalid password, whether I provide none or the user password (conveniently set ot be the same on both hosts). So I guess the Windows BOINC manager just was not intended to talk to Linux BOINC agents.
I am back to the "not reporting/submitting" issue. I tried the update_prefs and it still comes back that it is deferring communications for 6 days. I have no GUI on these Linux hosts, so I cannot run the BOINC manager on them to click the update button referenced in the help page (Running the Linux Agent). My startup command is now the following: nohup ./boinc -redirectio -allow_remote_gui_rpc -return_results_immediately -update_prefs http://www.worldcommunitygrid.com & I hate to destroy everything but it seems like I have to blow everything away and have it grab a new workload. The firewall is not an issue since I have wget to authenticate and keep the gate open. HELP! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My Windows BOINC manager will not talk to my Linux agent. I thought maybe because it was trying to authenticate as "Administrator" so I created an admin named "root" but it still says invalid password, whether I provide none or the user password (conveniently set ot be the same on both hosts). So I guess the Windows BOINC manager just was not intended to talk to Linux BOINC agents. I am back to the "not reporting/submitting" issue. I tried the update_prefs and it still comes back that it is deferring communications for 6 days. I have no GUI on these Linux hosts, so I cannot run the BOINC manager on them to click the update button referenced in the help page (Running the Linux Agent). My startup command is now the following: nohup ./boinc -redirectio -allow_remote_gui_rpc -return_results_immediately -update_prefs http://www.worldcommunitygrid.com & I hate to destroy everything but it seems like I have to blow everything away and have it grab a new workload. The firewall is not an issue since I have wget to authenticate and keep the gate open. HELP! I have 6 Linux boxes that I controll from any windows box... Your Linux boxes have a password storred in the file gui_rpc_auth.cfg It's usually really long string of numbers and letters, that's the password that you have to use when connecting with the windows boxes. I know you can change it to something you can remember but that requires an extra step that I don't remember off the top of my head... Just write it down and store it in a txt file like I did lol. You MIGHT also have an rpc_hosts.cfg, I don't, but I know they can exist in BOINC, if you do OR don't you can use that file to add IP's or names of allowed boxes. Kind of like a hosts.allow |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
But I notice BOINC will start up and then say something to the effect that it will talk to WCG in several days, even though work is completed. I have tried using the send results immediately flag but that doesn't seem to have any effect. At some random time it will decide to send results - or retrieve new work - and proceed. But if it takes too long, the firewall closes the door and it fails to talk, so it delays things tremendously - I do not get much time to devote to this troubleshooting. I now appear to have 3 devices not working since November 23 and fear the work unit will be lost due to too much time having lapsed. hechlerg, If BOINC attempts to communicate with the servers but is unable to reach them it will 'back off' for a short period of time and then try again. After each successive attempt, it will back off for a longer and longer period of time. This was implemented becuase if the servers were indeed unavailable, then as the length of the outage grew longer, more and more client would be wanting to contact the servers. By increasing the delay, it staggers the clients attempts to communicate with the server (otherwise if everyone attempt to communicate at once it could overwhelm the servers). Now to address your issue. I believe that your client attempt to contact the servers but due to the firewall it was unable to do so. Please try the following: In your BOINC directory there is a file called client_state.xml. In that file is a field that looks like: <min_rpc_time>1133821241.616419</min_rpc_time> This field represents the next time that the client will attempt to communicate with the servers (the time is in 'unix time' - you can convert this to a regular date via a site such as http://www.onlineconversion.com/unix_time.htm). What you need to do is to make the value of this field less then the current time. In order to do this do the following: 1. Stop BOINC 2. Edit the file and field so that it is something like <min_rpc_time>1123821241.616419</min_rpc_time> 3. Start BOINC using something like: nohup ./boinc -return_results_immediately -update_prefs http://www.worldcommunitygrid.com & And the BOINC client will contact the servers. Let us know if this does what you need. Kevin |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thank you Chase and Kevin. This is very useful information. I will attempt your suggestions at my next opportunity.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I like that little tweak on the time - works like a champ! I *think* I have all my Linux boinc agents back at work now - yeay!
I did not get the boincmgr working yet though - need more time. |
||
|
|
|