Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 16
Posts: 16   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3574 times and has 15 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashing on "suspend": wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu

This morning, in a cron job, wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu crashed twice, not on "suspend", but on "systemctl stop boinc-client.service", which does a "boinccmd --quit". It also crashed twice when I did the "systemctl stop boinc-client.service" from the command line - I tried it 2 times - it crashed both times.

Also, re my previous post: I may have misinterpreted - there may have been 12 *minutes* between checkpoints - which means a loss of CPU cycles. The list command should be "ls -ltr --time-style=full-iso /var/lib/boinc/slots/*/*check*". However, I can't test it at this time, because I can't stop boinc without it crashing!
[Jan 14, 2019 2:19:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2356
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Crashing on "suspend": wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu

Maybe I missed it, but how did you determine that wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu crashed? Was there an error message saying that it was crashing? What was the exact error message?

Since you said it crashes in some cases when you stop BOINC, I would then take a look in the designated 'slots' directory (e.g. ~boinc/slots/3) for the latest updated files (also in any subdirectory) to see if there are any error messages left behind.

This may help in determining the cause of the problem.
[Jan 14, 2019 4:26:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashing on "suspend": wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu

From the journal:
... ANOM_ABEND ...comm="wcgrid_mip1_ros" exe="/var/lib/boinc/projects/www.worldcommunitygrid.org/wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu" sig=11
... systemd-coredump[14639]: Process 12027 (wcgrid_mip1_ros) of user 976 dumped core...

In /var/lib/boinc/slots/*/stderr.txt - nothing of significance - no indication of ABEND
[Jan 14, 2019 5:01:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2356
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: Crashing on "suspend": wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu

Aaah, sig=11, if I'm imterpreting this right, then this is a segmentation violation:
       Signal     Value     Action   Comment
─────────────────────────────────────────────────────────────────────
SIGSEGV 11 Core Invalid memory reference
(from signal(7)). I'm afraid only the techs can help here.
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Jan 15, 2019 12:49:47 AM]
[Jan 14, 2019 5:55:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashing on "suspend": wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu

I reported the problem to Red Hat / Fedora bugzilla. It really seems like it is a WCG problem, not a Fedora problem, and not a boinc problem - but maybe that isn't true.

From the journal - kind of weird:
"systemd-coredump[14640]: Resource limits disable core dumping for process 11990 (wcgrid_mip1_ros)"

This thing is too big to dump? I'm sure there are ways around that, but huh?
[Jan 14, 2019 6:45:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashing on "suspend": wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu

I've experienced 42 ABENDS since December 11th - all are for the Microbiome Immunity Project.

EDIT:
I just learned that CPU cycles are *always" lost when you stop boinc, whether or not there was an abend. See http://wcg.wikia.com/wiki/Checkpoints
This is a design decision by the developers. I was assuming that when boinc was stopped a final checkpoint would be written - but this is not true.
----------------------------------------
[Edit 1 times, last edit by dswaner at Jan 15, 2019 10:52:29 PM]
[Jan 15, 2019 3:00:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 16   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread