| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 16
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This morning, in a cron job, wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu crashed twice, not on "suspend", but on "systemctl stop boinc-client.service", which does a "boinccmd --quit". It also crashed twice when I did the "systemctl stop boinc-client.service" from the command line - I tried it 2 times - it crashed both times.
Also, re my previous post: I may have misinterpreted - there may have been 12 *minutes* between checkpoints - which means a loss of CPU cycles. The list command should be "ls -ltr --time-style=full-iso /var/lib/boinc/slots/*/*check*". However, I can't test it at this time, because I can't stop boinc without it crashing! |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2356 Status: Recently Active Project Badges:
|
Maybe I missed it, but how did you determine that wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu crashed? Was there an error message saying that it was crashing? What was the exact error message?
Since you said it crashes in some cases when you stop BOINC, I would then take a look in the designated 'slots' directory (e.g. ~boinc/slots/3) for the latest updated files (also in any subdirectory) to see if there are any error messages left behind. This may help in determining the cause of the problem. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
From the journal:
... ANOM_ABEND ...comm="wcgrid_mip1_ros" exe="/var/lib/boinc/projects/www.worldcommunitygrid.org/wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu" sig=11 ... systemd-coredump[14639]: Process 12027 (wcgrid_mip1_ros) of user 976 dumped core... In /var/lib/boinc/slots/*/stderr.txt - nothing of significance - no indication of ABEND |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2356 Status: Recently Active Project Badges:
|
Aaah, sig=11, if I'm imterpreting this right, then this is a segmentation violation:
----------------------------------------Signal Value Action Comment(from signal(7)). I'm afraid only the techs can help here. [Edit 1 times, last edit by adriverhoef at Jan 15, 2019 12:49:47 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I reported the problem to Red Hat / Fedora bugzilla. It really seems like it is a WCG problem, not a Fedora problem, and not a boinc problem - but maybe that isn't true.
From the journal - kind of weird: "systemd-coredump[14640]: Resource limits disable core dumping for process 11990 (wcgrid_mip1_ros)" This thing is too big to dump? I'm sure there are ways around that, but huh? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've experienced 42 ABENDS since December 11th - all are for the Microbiome Immunity Project.
----------------------------------------EDIT: I just learned that CPU cycles are *always" lost when you stop boinc, whether or not there was an abend. See http://wcg.wikia.com/wiki/Checkpoints This is a design decision by the developers. I was assuming that when boinc was stopped a final checkpoint would be written - but this is not true. [Edit 1 times, last edit by dswaner at Jan 15, 2019 10:52:29 PM] |
||
|
|
|