Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 659 times and has 7 replies Next Thread
Dirk Gently
Senior Cruncher
England
Joined: Mar 1, 2005
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Questions about "Unhandled exception" error in HPF WU under BOINC

I run BOINC 5.2.1.3 and participate in Climate Prediction and WCG Projects - FAAH and Human Proteome Folding.

Machine is 3GHz Pentium D with 1GB RAM and 3GB Swapfile, XP prof SP2.

My 2 projects run happily with a processor each. Processor runs at 55-58 degC since I ditched the clogged up (within a month!) standard Intel processor cooler in favour of a heatpipe cooler!

I've been running BOINC for about a month (previously using the UD client for FAAH and HPF for a year). Ive now crunched quiite a lot of WCG units with BOINC, but have had TWO WCG Human Proteome Folding units terminate with an error since using BOINC.

At the time of the first fail I was doing some quite memory intensive image processing of very large scanned images - but I have plenty of RAM and Swapspace! For the second, I had just suspended all projects ready for a Windows restart (isn't this supposed to be the correct thing to do?). I'll mention at this point that I have set my projects to unload from memory while preempted, because I have seen other reports about this.

Time I asked a question! The obvious one is why did it happen, but for me there are 2 more important ones.

1) Is this a problem only with HPF workunits ? (cannot be sure based on only 2 failures)

2) Why was the WU totally lost? Why not go back to the last checkpoint?
A total of 12 hours of work was lost. I'd hate it to happen with my Climate prediction project (currently at 19% after 200 hours!!!)

Snapshots from the eror log are pasted below:

FIRST FAILURE

[04:57:59] BEGIN - L100 (structure 93)
Convergance ratio: 92:506 (0.181818)
ran3() called: 214465
[04:59:05] BEGIN - L100 (structure 93)
ERROR
Exception occurred while running Rosetta:
Operating System: Windows XP Service Pack 2 Build #: 2600

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C910E03 write attempt to address 0x00000004


SECOND FAILURE

Starting checkpoint of structure: 70
Completed Checkpoint of structure 70
[20:48:45] BEGIN - L100 (structure 71)

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C910E03 write attempt to address 0x00000000

ERROR
Exception occurred while running Rosetta:
Operating System: 1: 03/04/06 20:49:03
Windows XP Service Pack 2 Build #: 2600

Significant to note that it is exactly the same error. I would be grateful for any comments that anyone has.

Robert
----------------------------------------
[Mar 5, 2006 11:18:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Questions about "Unhandled exception" error in HPF WU under BOINC

I'll mention at this point that I have set my projects to unload from memory while preempted

There is a known bug in BOINC that sometimes strikes unless you choose 'Leave in memory when preempted'. But if you do choose that, make sure that you have enough Virtual Memory to handle everything. I suspect that you have run into this bug.

Lawrence
[Mar 5, 2006 2:42:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dirk Gently
Senior Cruncher
England
Joined: Mar 1, 2005
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about "Unhandled exception" error in HPF WU under BOINC

Thanks for the reply, Lawrence.

That answers question 1 I think, and means that there is a general problem in BOINC, which means that it could happen to other work units for other projects, not just HPF.

I am going to change to "Leave in memory while preempted", but I would think this introduces dangers of its own. After all if the PC crashes (very rare these days I know), everything in memory, including virtual memory, is lost.

What is the best way to restart the PC? Suspend projects first and then exit BOINC? Or just let windows do it?

I still cannot see why the workunit did not recover from its last checkpoint. What else are checkpoints for?

Robert
----------------------------------------
[Mar 5, 2006 4:34:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Questions about "Unhandled exception" error in HPF WU under BOINC

I still cannot see why the workunit did not recover from its last checkpoint. What else are checkpoints for?
I suspect (just a personal opinion) that when the application suddenly returns an error code, BOINC just sends that error code to the server and starts on another work unit. Yes, in this particular case, restarting from a check point should work, but that would not work for all errors.

Lawrence
[Mar 5, 2006 8:46:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Questions about "Unhandled exception" error in HPF WU under BOINC

I also have this problem on Errors in BOINC. How can I set the option "Leave in memory while preempted"? I cannot find the option.

Huib
Netherlands
[Mar 10, 2006 7:27:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Questions about "Unhandled exception" error in HPF WU under BOINC

The full phrase is 'Leave applications in memory while preempted'. This option is one of the General Preferences in BOINC explained at http://boinc-wiki.ath.cx/index.php?title=Preferences_in_the_BOINC_System

This option can really put a burden on your Virtual Memory file size.

Added: Because of VM requirements, I would prefer not to choose this option. But until this bug is fixed, it is almost mandatory to select it if you will ever run more than one BOINC work unit at a time.

Later: Over on Rosetta@home David Baker posted this on 9 March 2006 at http://boinc.bakerlab.org/rosetta/forum_thread.php?id=1177#11811
Good news on the tracking down the remaining errors front--a long time BOINC and SETI@home expert, Rom, is joining us as a consultant to help fix the "leave in memory", graphics, and other problems related to the rosetta-BOINC interface.

So people are definitely working on this irritating bug.
----------------------------------------
[Edit 2 times, last edit by Former Member at Mar 10, 2006 2:21:22 PM]
[Mar 10, 2006 1:54:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dirk Gently
Senior Cruncher
England
Joined: Mar 1, 2005
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about "Unhandled exception" error in HPF WU under BOINC

I also have this problem on Errors in BOINC. How can I set the option "Leave in memory while preempted"? I cannot find the option.

Huib
Netherlands


It is not in the BOINC program itself. You need to log in to your account and select "Device Manager" under "My Grid". You will find the setting if you edit the profile that your PC is set to use (probably "Home").
----------------------------------------
[Mar 11, 2006 12:02:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dirk Gently
Senior Cruncher
England
Joined: Mar 1, 2005
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Questions about "Unhandled exception" error in HPF WU under BOINC

Added: Because of VM requirements, I would prefer not to choose this option. But until this bug is fixed, it is almost mandatory to select it if you will ever run more than one BOINC work unit at a time.



Might be an idea if WCG were to set this in the default profiles until the bug is fixed. It could also be conditional on the machine having enough VM.

I have just lost another WU on a new machine - before I got around to changing it!
----------------------------------------
[Mar 11, 2006 12:18:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread