Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 16
Posts: 16   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1198 times and has 15 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Issues with WGC errors

I'm currently running a Gentoo system on a system that otherwise seems to be rock stable. However, trying to run WGC via BOINC seems to die after a while, I seem to end up with a lot of errors, like:

2006-01-08 09:32:14 [World Community Grid] Resuming computation for result ed997_19_5 using rosetta version 421
2006-01-08 09:42:02 [World Community Grid] Unrecoverable error for result ed997_19_5 (process exited with code 131 (0x83))
2006-01-08 09:42:02 [World Community Grid] Unrecoverable error for result ed997_19_5 (process exited with code 131 (0x83))
2006-01-08 09:42:02 [---] request_reschedule_cpus: process exited

as well as

2006-01-08 08:11:47 [World Community Grid] Started download of el002_15_el002.psipred
SIGSEGV: segmentation violationStack trace (3 frames):
./boinc[0x80845b2]
/lib/libpthread.so.0[0x401635d9]
/lib/libc.so.6[0x4004ae38]

Exiting...


Can anyone suggest what the heck is going on here and how I can fix it?
[Jan 8, 2006 8:52:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Issues with WGC errors

Knreed should be along to help with this. He's the Linux/WCG guru.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 9, 2006 12:43:34 PM]
[Jan 9, 2006 12:42:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Issues with WGC errors

What is your application stack limit?

You can find out by typing:
ulimit -s
----------------------------------------
Rick Alther
Former World Community Grid Developer
[Jan 9, 2006 2:22:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Issues with WGC errors

The application stack limit is currently set to 8192.
[Jan 9, 2006 4:38:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
confused Re: Issues with WGC errors

Here is what I found for SIGSEGV: http://www.wlug.org.nz/SIGSEGV

So I guess the place to start is with
Resuming computation . . .

How are your BOINC preferences set up? Do you suspend programs and keep them in memory? Are you running multiple projects?

Also, could you describe your computer system? RAM, virtual memory partition? If you run ulimit -a to determine all your current limits, do they look reasonable (no surprises)?

Reading other bulletin boards, people run into various bugs when resuming projects. So my main thrust here is to come up with a way to run BOINC projects on your computer that works, even though I do not know just what the bugs are.
smile
mycrofth
[Jan 9, 2006 11:47:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Issues with WGC errors

The machine is a Dual Xeon with 2GB RAM with a 2GB swap partition. Processes are not normally suspended (the machine acts as a server), and there's only one instance of BOINC running - hence only a single project at a time. ulimit settings look reasonable as far as I can tell.
[Jan 10, 2006 10:26:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Issues with WGC errors

I haven't attempted to run BOINC with any other projects other than WCG - one step at a time please! ;)
[Jan 10, 2006 11:23:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
confused Re: Issues with WCG errors

???
Here is the list of error codes: http://boinc-doc.net/boinc-wiki/index.php?title=Error_Code

But it leaves me perplexed.

The only idea I come up with is to wonder if the CPU temperature is reasonable.

Does anybody have any ideas?
mycrofth
[Jan 10, 2006 11:37:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Issues with WCG errors

The machine is a Dell PowerEdge server with a remote monitoring card (DRAC). The card indicates that the temperatures, fans and voltages are all within their normal tolerances.
[Jan 10, 2006 11:58:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Issues with WCG errors

I have since realized that I should have been looking at the 'Unrecoverable Error' messages at http://boinc-doc.net/boinc-wiki/index.php?title=Category:BOINC_Error_Message but that does not help since 0x81 is not listed.
[Jan 10, 2006 12:03:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 16   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread