Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1689 times and has 4 replies Next Thread
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 820
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Resolved! - Linux remote machine stopped crunching, need reset, typing something wrong

My main cruncher is a rented machine, it is headless, and I can only connect with it via SSH. I've run it for years, no real complaints. I'm running FAHB exclusively on it now which means 24 hour deadlines. I thought my stats were off the last few "my contribution" updates, and it turns out that machine is still on but seems to have stopped trying to work on those WUs.

I tried rebooting the machine and then via SSH doing command 'top' it shows boinc is running and none of the expired WUs are.

It looks like I have enough WUs for each core, not sure what went wrong, so I want to reset BOINC, I typed
boinccmd -- https://www.worldcommunitygrid.org reset
boinccmd --https://www.worldcommunitygrid.org reset
boinccmd --project https://www.worldcommunitygrid.org reset
and a few other variants but each time it just shows me a list of commands.

How am I typing this wrong?

Edit: Marking the topic as "resolved"
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Seoulpowergrid at Oct 28, 2018 6:11:14 AM]
[Oct 26, 2018 3:30:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux remote machine stopped crunching, need reset, typing something wrong

boinccmd --project https://www.worldcommunitygrid.org reset

Should be the right format. But I would guess, that the right project url is with http://.... Not https://. At my machines that is the case. I'm not sure if you can interchange one for the other.

Another tip if you have a headless cruncher: try boinctui, which is a command line based manager with nearly the same comfort as the gui boincmanager. Much more comfortable than using boinccmd.
[Oct 26, 2018 3:48:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 820
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux remote machine stopped crunching, need reset, typing something wrong

Thanks for the help and advice. I'm slow with Linux but am learning as needed :)
----------------------------------------

[Oct 27, 2018 12:21:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux remote machine stopped crunching, need reset, typing something wrong

It varies a little bit depending on which Linux distro, but there should be a pair of log files /var/log/boinc.log and /var/log/boincerr.log which would perhaps indicate what's wrong, or give a clue what's gone wrong.

Also review your standard system log (/var/log/messages or /var/log/syslog on most Linux) for obvious errors, as well as check all the basics like disk space and read-only mounted partitions (like for your /var/lib/boinc/ or /var/lib/boinc-client/ directories).

Hard to say what's gone wrong yet - the fact that the boinc process starts would seem to be a good indicator it hasn't completely blown up (a standard system upgrade gone wrong, resulting in a boinc that won't even start for example) so there has to be a log somewhere indicating why it's not having a good day.

EDIT: also, remember that the actual WU is a separate little binary that is run/launched/controlled by the boinc wrapper process - it's possible that if your system is old and hasn't been upgraded in awhile, the newest WU binaries being downloaded won't run or are crashing invisibly ("segfault" in Linux terms). They are child processes of the main boinc one, so 'ps -ef | grep boinc' or 'ps auxw | grep boinc' or 'pstree -la boinc' (assuming it's running as user "boinc") should show each WU binary running with a long commandline, something like this:

# pstree -lA boinc

sh---boinc-+-wcgrid_mcm1_map---{wcgrid_mcm1_map}
|-wcgrid_mip1_ros---{wcgrid_mip1_ros}
|-wcgrid_oet1_vin---2*[{wcgrid_oet1_vin}]
|-wcgrid_zika_vin---2*[{wcgrid_zika_vin}]
`-{boinc}

# ps o ppid,pid,user,comm -U boinc

PPID PID USER COMMAND
1 585 boinc sh
585 597 boinc boinc
597 12000 boinc wcgrid_mcm1_map
597 13098 boinc wcgrid_oet1_vin
597 13622 boinc wcgrid_mip1_ros
597 13745 boinc wcgrid_zika_vin


Generally your running boinc fingerprint looks like that, a parent process of the main boinc, then several (1 per CPU thread, usually) running WUs.
----------------------------------------
[Edit 1 times, last edit by xithryx at Oct 27, 2018 2:45:45 PM]
[Oct 27, 2018 2:21:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 820
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux remote machine stopped crunching, need reset, typing something wrong

It's been not crunching for a few days, I've been rather busy but trying stuff here and there. Tried the reset command that Sheridon helped me with; it took it but didn't seen to change anything. More reboots, more 'get_tasks' and 'get_file_transfers' and nothing changed but I noted it had nrpc error along with another connection error. I changed the profile to SCC as my other machines are doing that.

A day passed, I repeated the steps again; nothing. HDD has plenty of room, RAM is plentiful. Then 10 minutes later I was checking the logs as xithryx recommended and a bunch of SCC WUs decided to come in and everything is fine. Now queue is mostly full of SCC, then I swapped the profile to FAHB and it downloaded those successfully and is working on those.

*insert really confused emoji scratching head and then shrugging*

Thanks again for all the feedback! I'm still not fully sure what went wrong but it seems to be working now. I've looked at boinctui and it looks very nice. I'll check that in more detail. Thanks again and have a lovely day~
----------------------------------------

[Oct 27, 2018 5:08:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread