| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 19
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
wcg_beta_img_5.17_i686-pc-linux-gnu BETA_X0000045460272200502081427_ 1-- appears to have made an illegal memory reference at the end of its run. (At least, it was due to end at rougly that time.)
<core_client_version>5.8.15</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> About to call graphics init dlopen() failed: libGL.so.1: cannot open shared object file: No such file or directory No graphics. INFO: No state to restore. Start from the beginning. ERROR: Restoring checkpoint failed. Unable to restore state! In ExtractGlcmFeatures: End of 0 iteration of outer loop. In ExtractGlcmFeatures: End of 1 iteration of outer loop. [... boring part edited to save space ... ] In ExtractGlcmFeatures: End of 23 iteration of outer loop. In ExtractGlcmFeatures: End of 24 iteration of outer loop. </stderr_txt> ]]> Any idea what went wrong, gurus? |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
It's Beta, so the gurus will likely say: What's the status code on the Result Status page? If Pending Validation, all is 'normal' in the result log. Else, leave it to the techs to analyse so they can fix it before the project launch.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sorry if it wasn't explicit, but I wouldn't have posted were it not that it returned Error.
I was only posting as I'm curious as to whether they know what went wrong. Also curious as to whether we get credit in these cases. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
On 'error/invalid/valid' results: No the beta policy is that it gets the same treatment on credits as regular work.
----------------------------------------Added: https://secure.worldcommunitygrid.org/ms/device/viewBetaProfiles.do In either case, you will receive credit and points for Beta Test work just as you would for any other project.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Oct 28, 2007 1:09:53 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Else, leave it to the techs to analyse so they can fix it before the project launch. Well, it's a shame they didn't. A project WU crashes exactly the same way. So what's the point of beta if you ignore the results? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My reading tells me that signal 11 is usually caused by a hardware error. (I'm fairly sure segfaults generate a different error.) So, the first thing to do is to check the status of the other work units in your quorum. If they are returned as valid (or pending validation) then clearly you are alone in having this problem.
If that is the case, then it is time to do some hardware diagnostics. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My reading tells me that signal 11 is usually caused by a hardware error. (I'm fairly sure segfaults generate a different error.) Signal 11 is, by definition, segmentation violation. It means that a process tried to access memory out of its process space, or tried to write into a read-only location. Yes, this can be caused by hardware faults. However, there are 3 reasons to believe this is not a hardware issue: 1) As far as I can remember, these two machines, which have devoted between them almost a year of CPU time to WCG, have never had an error on any other project. 2) They both crashed at exactly the same point of these WUs. 3) One of them is a dual-processor machine, and the other WU it was working on at the time was unaffected. If the same project crashes in the same place on two different machines which are (apart from that one project) 100% reliable, I find the hardware fault hypothesis to be highly unlikely, to say the least. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes, signal 11 is SIGSEGV - I just thought BOINC logged them differently. Maybe I've been in Windows-land for too long....
Please will you confirm that the other copies of this work unit succeeded? If they failed, the work unit is automatically reported to WCG. If not, then we need to work out what is special about the computer on which it failed. I know you want to rule out hardware issues, but it is the first thing we have to check. Do you overclock? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Please will you confirm that the other copies of this work unit succeeded? If they failed, the work unit is automatically reported to WCG. If not, then we need to work out what is special about the computer on which it failed. I know you want to rule out hardware issues, but it is the first thing we have to check. Do you overclock? Other copies of the WU succeeded. The one that failed on beta is a stock standard dual-P3/866 Dell server running slightly underclocked at 860.9Mhz. The one which failed most recently is an ancient Celeron overclocked to 75MHz FSB rather than 66MHz. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I will ask the techs to look this over.
Be advised, though, that they will check the invalid return rate for the project, and if it is low then looking into this will take a low priority. So far, you are the only member to experience this problem. Meanwhile, you can deselect the project - unless you feel like attaching a debugger and trying to get a stack trace to help the techs.... |
||
|
|
|