Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1171 times and has 4 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Problem with 'stuck' WU under FAAH 6.05

Folks,
I'm seeing an issue with a 'stuck' WU. The job is not using any CPU resources and none of the counters in BOINC manager are incrementing. BOINC manager shows the job running with high priority. The job details are:

WU Name: faah4081_Ampr_wH_xmd16730_01.dpf
BOINC version: 5.10.21
FAAH version: 6.05
O/S: Linux (kernel 2.6.21-6.fc7xen)

STDERR for this job is included at the end of this post.

Has anyone else seen an issue like this and (if so) do you have any ideas about a possible fix? I'd seen this happen once before (although I'm not sure if the problem was also with FAAH). I had previously tried suspending and re-enabling the WU - but that didn't work for me.

Cheers
Sean

STDERR begins:
*************
INFO:[00:22:43] Start AutoGrid...

autogrid: autogrid4: Successful Completion.
INFO:[00:24:51] End AutoGrid...
Beginning AutoDock...
INFO: Setting num_generations: 10000
_maxGenSeenSoFar changed: 2500
About to enter main loop...(dockings already completed: 0)
_maxGenSeenSoFar changed: 2626
_maxGenSeenSoFar changed: 2758
_maxGenSeenSoFar changed: 2896
_maxGenSeenSoFar changed: 3041
_maxGenSeenSoFar changed: 3194
_maxGenSeenSoFar changed: 3354
_maxGenSeenSoFar changed: 3522
_maxGenSeenSoFar changed: 3699
_maxGenSeenSoFar changed: 3885
_maxGenSeenSoFar changed: 4080
_maxGenSeenSoFar changed: 4285
_maxGenSeenSoFar changed: 4500
_maxGenSeenSoFar changed: 4726
_maxGenSeenSoFar changed: 4963
_maxGenSeenSoFar changed: 5212
_maxGenSeenSoFar changed: 5473
_maxGenSeenSoFar changed: 5747
_maxGenSeenSoFar changed: 6035
_maxGenSeenSoFar changed: 6337
_maxGenSeenSoFar changed: 6654
_maxGenSeenSoFar changed: 6987
_maxGenSeenSoFar changed: 7337
_maxGenSeenSoFar changed: 7704
_maxGenSeenSoFar changed: 8090
_maxGenSeenSoFar changed: 8495
_maxGenSeenSoFar changed: 8920
_maxGenSeenSoFar changed: 9367
_maxGenSeenSoFar changed: 9836
_maxGenSeenSoFar changed: 10328
Updating Best Energy for WU: 0.00
Finished Docking number 0
Updating Best Energy for WU: -8.66
Finished Docking number 1
Finished Docking number 2
Finished Docking number 3
Finished Docking number 4
Finished Docking number 5
Finished Docking number 6
Finished Docking number 7
Updating Best Energy for WU: -10.93
Finished Docking number 8
Finished Docking number 9
Finished Docking number 10
Finished Docking number 11
Finished Docking number 12
Finished Docking number 13
Finished Docking number 14
[May 24, 2008 12:20:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problem with 'stuck' WU under FAAH 6.05

First, check whether BOINC is starved for CPU time. If another process is using most or all CPU time, BOINC will get little or none.

If it is really stalled, try restarting BOINC.

If all else fails, consider aborting.
[May 24, 2008 12:52:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problem with 'stuck' WU under FAAH 6.05

First, check whether BOINC is starved for CPU time. If another process is using most or all CPU time, BOINC will get little or none.

If it is really stalled, try restarting BOINC.

If all else fails, consider aborting.


Hi Didactylos,
thanks for your quick reply. The machine is a 2 x Xeon and the other 3 cores were busy. However I gave the stalled job a kill -HUP and it came back to life. I guess the same result would have been achieved by restarting BOINC as you'd suggested. Anyway problem solved and thanks for your help !

Regards,
Sean
[May 24, 2008 3:20:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problem with 'stuck' WU under FAAH 6.05

It's this "I had previously tried suspending and re-enabling the WU" that is known not to work. It's necessary to unload the job from memory which is most often achieved by suspend WCG in the project tab for 30 seconds. This kicks the job back to the last checkpoint. Unloading BOINC completely is the next sure thing and forces the issue, but looses any progress after last checkpoint for all jobs in progress.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 24, 2008 3:41:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problem with 'stuck' WU under FAAH 6.05

Hi Sekerob,
cheers and thanks a lot for the extra info. I'll keep this in mind for any future problems that may occur.

Regards,
Sean
[May 24, 2008 8:38:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread