Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2114 times and has 6 replies Next Thread
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
July2015 - getting signal 11 on CEP2

Hi,
I t looks like all my CEP2 tasks are erroring out with a segmentation fault..
A typical wu error is
https://secure.worldcommunitygrid.org/ms/devi...Log.do?resultId=456818186

I'm running Ubuntu 14.04 (Trusty Tahr) on a 64 bit machine.
So far 43 CEP2 tasks had this error today.
Some tasks get the fault early-on. others take a bit longer- example:
  Result Name: E231400_ 243_ S.286.C34H22N4O2S1.NUHIDNLVHFFOGM-UHFFFAOYSA-N.1_ s1_ 14_ 1--
<core_client_version>7.6.2</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[06:05:40] Number of jobs = 8
[06:05:40] Starting job 0,CPU time has been restored to 0.000000.
[06:05:40] Starting new Job
[06:05:40] Qink name = fldman
[06:05:41] Qink name = gesman
[06:05:42] Qink name = scfman
[06:45:12] Qink name = anlman
[06:45:12] Qink name = drvman
[06:47:21] Qink name = optman
[06:47:21] Qink name = fldman
[06:47:21] Qink name = gesman
[06:47:22] Qink name = scfman
[07:02:35] Qink name = anlman
[07:02:35] Qink name = drvman
[07:04:47] Qink name = optman
[07:04:47] Qink name = fldman
[07:04:47] Qink name = gesman
[07:04:49] Qink name = scfman
</stderr_txt>


Other WU abort within a minute
 	
Result Log

Result Name: E231436_ 608_ S.234.C28H26O1S2.RIHDJSGBDWKQAS-UHFFFAOYSA-N.5_ s1_ 14_ 1--
<core_client_version>7.6.2</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[07:00:59] Number of jobs = 8
[07:00:59] Starting job 0,CPU time has been restored to 0.000000.
[07:00:59] Starting new Job
[07:00:59] Qink name = fldman
[07:01:00] Qink name = gesman
[07:01:01] Qink name = scfman

The BOINC event lag had this to say (on a sample failure):

Sat 04 Jul 2015 10:19:16 AM EDT | World Community Grid | Scheduler request completed
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | [task] Process for E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 exited, status 11, task state 1
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | [task] process got signal 11
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | [task] task_state=WAS_SIGNALED for E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 from handle_exited_app
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | [task] result state=COMPUTE_ERROR for E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 from CS::report_result_error
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | Output file E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0_0 for task E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 absent
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | Output file E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0_1 for task E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 absent
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | Output file E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0_2 for task E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 absent
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | Output file E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0_3 for task E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 absent
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | [task] result state=COMPUTE_ERROR for E231439_881_S.274.C34H26N4O2.JZZPFYONTNLVJV-UHFFFAOYSA-N.7_s1_14_0 from CS::app_finished
Sat 04 Jul 2015 10:19:56 AM EDT | World Community Grid | [task] ACTIVE_TASK::start(): forked process: pid 5702

I've take CEP2 off my list for now.
One is running. I'll suspend all other tasks and see if it will complete..
Suggestions for debug??
Thanks, Jay

PS
Have recently passed memory tests and SMART disk tests
running 7.6.4.2 BOINC from the PPA (Installed in the last week)
----------------------------------------

[Jul 4, 2015 5:18:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: July2015 - getting signal 11 on CEP2

Straight from the Start Here FAQ Index [stickied top of all forums ;]
http://boincfaq.mundayweb.com/index.php?view=459
[Jul 4, 2015 5:28:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: July2015 - getting signal 11 on CEP2

Once again,
T H A N K S Rob!

Will start delving into ia32.
I had run a memtest - will try again.
( had a blip where UEFI wouldn't let memtest86 into grub. Will burn another CD and try a booting fron from non-uefi.

Jay
ps
enjoyed the link on climate warming.
----------------------------------------

[Jul 4, 2015 11:55:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: July2015 T- getting signal 11 on CEP2

That FAQ by ageless is a little aged (pun). The latest Ubuntu does not need this one, it wont even install. Search forums on what libs to check/load with apt-get install. Get 100% score on Ubuntu 14.04 LTS, but I've set BOINC to pausr when certain progs run with <exclusive_app> in cc_config and when non-BOINC load is greater than 50 percent.

Edit: Of course OS and client are 64bit.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 5, 2015 7:51:49 AM]
[Jul 5, 2015 7:50:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: July2015 T- getting signal 11 on CEP2

Thanks Rob,
I ran memtest 86 + - no errors reported in >1 pass.
Have tried running at 25 an 37.5% of 8 processors - 1CEP, 1 Einstein (GPU) 1 pogs
so far 33 minutes without a segment violation.

Currently trying
different Display Manager ( was marco in mate, now gnome) maybe socket congestion?

On the list of things to try (grasping straws here):
Re-seating disk cables, etc.
Remove-Reload all of BOINC when all WU complete (was 7.4.22-ppa, will try 7.2.42)
Rebuild computer with 15.04 release of Ubuntu (will try Mate again)

Thanks again,
Jay
----------------------------------------

[Jul 6, 2015 4:18:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: July2015 T- getting signal 11 on CEP2

Greetings,
An update..
The problem may have been a loose cable.
(I wasn't careful when vacuuming out dust..)

Also, I had misunderstood some settings.

This article from the Harvard team helped!
http://static.molecularspace.org/uploads/2013..._CEP2_Custom_Settings.pdf

The next item on the list is to replace old batteries in my UPS.

I would like to thank the Harvard especially for the Tips and Tricks article.

Jay
----------------------------------------

[Jul 8, 2015 2:01:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jay_Orlando
Senior Cruncher
USA
Joined: Jan 4, 2006
Post Count: 189
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: July2015 T- getting signal 11 on CEP2

Yes.
Cable must have been problem.
Completed 2 CEP2 now with other WU running.
Thanks to all.
Jay
----------------------------------------

[Jul 8, 2015 3:08:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread