Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 35
Posts: 35   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 7654 times and has 34 replies Next Thread
lidden
Cruncher
Joined: Dec 16, 2005
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Crashes

Does the Result Status page actually give a "Error" or "Invalid" report?


It is an "Error". It seems I have both 32 and 64 bit libs installed. I looked at
the man page for dlopen() but could not guess any argument to give it in
a little test program, to see what it does.

Selected output from "find / -iname 'libgl*'":
/usr/lib/libGLEW.so.1.4.0
/usr/lib/libGL.so.1
/usr/lib/libglib-2.0.so.0.1400.1
/usr/lib/libGLU.so.1.3.070001
/usr/lib/libglibmm-2.4.so.1
/usr/lib/libGL.so.100.14.19
/usr/lib/libglut.so.3
/usr/lib/libglibmm_generate_extra_defs-2.4.so.1.0.24
/usr/lib/libglib-2.0.so.0
/usr/lib/compiz/libglib.so
/usr/lib/libglibmm-2.4.so.1.0.24
/usr/lib/nvidia/libGL.so.1.xlibmesa
/usr/lib/nvidia/libGL.so.1.2.xlibmesa
/usr/lib/libGLEW.so.1.4
/usr/lib/libglib-1.2.so.0.0.10
/usr/lib/libglade-2.0.so.0.0.7
/usr/lib/gtk-2.0/2.10.0/engines/libglide.so
/usr/lib/libglib-1.2.so.0
/usr/lib/libGLU.so.1
/usr/lib/libglut.so.3.8.0
/usr/lib/libglade-2.0.so.0
/usr/lib/libglade
/usr/lib/libglibmm_generate_extra_defs-2.4.so.1
/usr/lib/libGLcore.so.100.14.19
/usr/lib/libGLcore.so.1
/usr/lib32/libGL.so.1
/usr/lib32/libglib-2.0.so.0.1400.1
/usr/lib32/libGLU.so.1.3.070001
/usr/lib32/libGL.so.100.14.19
/usr/lib32/libglut.so.3
/usr/lib32/libglib-2.0.so.0
/usr/lib32/nvidia/libGL.so.1.xlibmesa
/usr/lib32/nvidia/libGL.so.1.2.xlibmesa
/usr/lib32/libglade-2.0.so.0.0.7
/usr/lib32/gtk-2.0/2.10.0/engines/libglide.so
/usr/lib32/libGLU.so.1
/usr/lib32/libglut.so.3.8.0
/usr/lib32/libglade-2.0.so.0
/usr/lib32/libglade
/usr/lib32/libGLcore.so.100.14.19
/usr/lib32/libGLcore.so.1
[Jan 27, 2008 2:02:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

Forgive my ignorance, but I'm curious. Linux works in much the same way, for PC's, that UNIX does in large server environments. Complete with a VI text editor?
Isn't there a way to tweak the way an application runs, or communicates? Utilizing the pipe command in VI? I seem to remember a whole day of instruction in UNIX class discussing this. It was 14 years ago though, and I can't find my notes! smile rose good luck
[Jan 28, 2008 12:23:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

Ok. More relevant question: Are they ever going to fix the error?

I enabled HCC again and it crashed again, exactly the same as the beta used to. This time, on an AMD machine. (WU X0000047371033200502110919_ 1, though the other WU crashed too!)

At the very least, if they did work out what the problem is, why not test for it and abort the jobs immediately, rather than have them run full length and then abort?!
[Jan 28, 2008 11:53:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

If this is the error I think it is, then the techs worked on the problem and reduced the error rate but they weren't able to eliminate the problem completely.

It is possible that on some (very few) computers, HCC work units will always fail. If you see this, I urge you to disable it. Other computers will pick up the slack and HCC won't suffer.

If this *isn't* the same error, then please tell me more.
[Jan 29, 2008 12:28:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

It is possible that on some (very few) computers, HCC work units will always fail. If you see this, I urge you to disable it. Other computers will pick up the slack and HCC won't suffer.


As someone who's worked in IT all my life, I find the vagueness inherently annoying. HCC has failed on every machine I've tried it on, in exactly the same way, and the techs can't fix it? I'm finding it hard not to say what I really think of that.
[Jan 29, 2008 1:32:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

We are running a debug version of the app in development attempting to catch additional information about the issue so that we can address it.


What information did you catch?
Is it particular version of the Linux kernel?
Particular library versions?
Is it, as appears, happening after the run is complete? Could the data be saved by ignoring the SIGSEGV? Could you avoid calling the code which is causing the crash?

If we don't get the information that we need in development, then we will be putting it into beta to run on more computers.


Did that happen?

If you are getting errors on Linux, then you consider temporarily disabling the project.


Temporarily meaning how much longer?

We are aggressively looking at this problem now and hope to have it resolved soon.


ETA?
[Jan 29, 2008 1:41:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

That was before the partial solution. I don't believe the techs are working further on this particular problem, but there is at least one other HCC issue under consideration right now.

If you want to know why *you* seem to have the problem on all the machines you've tried, then you will have to tell us what those computers have in common.

So far, you haven't told us a single thing about your operating system. All the details you just listed, in fact! Please, share - if you can spot the commonality, that will be a great help. If it's specific enough, the techs may be able to take another stab at it.
[Jan 29, 2008 2:23:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

If you want to know why *you* seem to have the problem on all the machines you've tried, then you will have to tell us what those computers have in common.


One person's machines is an awfully small sample. Haven't you even worked out which range of kernels/libs/etc it fails for yet? I'd have thought a part of the debugging process would have been to have looked into the commonality on a wider scale in the first place.

The first thing to do would be to find out where the crash is occurring, which should have been very easy since it's quite reproducible. (That should have happened during beta testing.) Then, with the problem piece of code isolated, it could have either been fixed or the particular library or kernel that causes problems isolated so that we all know exactly what HCC can't cope with.

My machines are all running Debian with 2.4.27-3-686 kernels and no graphics. Back when the beta was running, some were probably running an earlier 2.4 kernel.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 29, 2008 2:59:47 AM]
[Jan 29, 2008 2:49:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

As I said before, the techs are finished with this problem. If you want them to look again, we need something concrete.

Are these machines headless?

I know blaming the techs is easy, but it doesn't help them do their job, and it certainly doesn't help you. Blaming me might be fun, but I'm a volunteer trying to help you.

You will remember from earlier in the thread, the error rate was 8%. Just going from memory, the current error rate is 3%. That probably includes aborts and other lost results, so while it is higher than we would like, it's probably not as high as it looks.
[Jan 29, 2008 3:16:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Crashes

Are these machines headless?


Not strictly. They have keyboards attached (partly because some BIOSs refuse to boot with no keyboard present) and some have monitors attached, though turned off, unless I need to see the BIOS messages, for example.
[Jan 29, 2008 3:37:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 35   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread