| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 5
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Folks,
I'm seeing an issue with a 'stuck' WU. The job is not using any CPU resources and none of the counters in BOINC manager are incrementing. BOINC manager shows the job running with high priority. The job details are: WU Name: faah4081_Ampr_wH_xmd16730_01.dpf BOINC version: 5.10.21 FAAH version: 6.05 O/S: Linux (kernel 2.6.21-6.fc7xen) STDERR for this job is included at the end of this post. Has anyone else seen an issue like this and (if so) do you have any ideas about a possible fix? I'd seen this happen once before (although I'm not sure if the problem was also with FAAH). I had previously tried suspending and re-enabling the WU - but that didn't work for me. Cheers Sean STDERR begins: ************* INFO:[00:22:43] Start AutoGrid... autogrid: autogrid4: Successful Completion. INFO:[00:24:51] End AutoGrid... Beginning AutoDock... INFO: Setting num_generations: 10000 _maxGenSeenSoFar changed: 2500 About to enter main loop...(dockings already completed: 0) _maxGenSeenSoFar changed: 2626 _maxGenSeenSoFar changed: 2758 _maxGenSeenSoFar changed: 2896 _maxGenSeenSoFar changed: 3041 _maxGenSeenSoFar changed: 3194 _maxGenSeenSoFar changed: 3354 _maxGenSeenSoFar changed: 3522 _maxGenSeenSoFar changed: 3699 _maxGenSeenSoFar changed: 3885 _maxGenSeenSoFar changed: 4080 _maxGenSeenSoFar changed: 4285 _maxGenSeenSoFar changed: 4500 _maxGenSeenSoFar changed: 4726 _maxGenSeenSoFar changed: 4963 _maxGenSeenSoFar changed: 5212 _maxGenSeenSoFar changed: 5473 _maxGenSeenSoFar changed: 5747 _maxGenSeenSoFar changed: 6035 _maxGenSeenSoFar changed: 6337 _maxGenSeenSoFar changed: 6654 _maxGenSeenSoFar changed: 6987 _maxGenSeenSoFar changed: 7337 _maxGenSeenSoFar changed: 7704 _maxGenSeenSoFar changed: 8090 _maxGenSeenSoFar changed: 8495 _maxGenSeenSoFar changed: 8920 _maxGenSeenSoFar changed: 9367 _maxGenSeenSoFar changed: 9836 _maxGenSeenSoFar changed: 10328 Updating Best Energy for WU: 0.00 Finished Docking number 0 Updating Best Energy for WU: -8.66 Finished Docking number 1 Finished Docking number 2 Finished Docking number 3 Finished Docking number 4 Finished Docking number 5 Finished Docking number 6 Finished Docking number 7 Updating Best Energy for WU: -10.93 Finished Docking number 8 Finished Docking number 9 Finished Docking number 10 Finished Docking number 11 Finished Docking number 12 Finished Docking number 13 Finished Docking number 14 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
First, check whether BOINC is starved for CPU time. If another process is using most or all CPU time, BOINC will get little or none.
If it is really stalled, try restarting BOINC. If all else fails, consider aborting. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
First, check whether BOINC is starved for CPU time. If another process is using most or all CPU time, BOINC will get little or none. If it is really stalled, try restarting BOINC. If all else fails, consider aborting. Hi Didactylos, thanks for your quick reply. The machine is a 2 x Xeon and the other 3 cores were busy. However I gave the stalled job a kill -HUP and it came back to life. I guess the same result would have been achieved by restarting BOINC as you'd suggested. Anyway problem solved and thanks for your help ! Regards, Sean |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
It's this "I had previously tried suspending and re-enabling the WU" that is known not to work. It's necessary to unload the job from memory which is most often achieved by suspend WCG in the project tab for 30 seconds. This kicks the job back to the last checkpoint. Unloading BOINC completely is the next sure thing and forces the issue, but looses any progress after last checkpoint for all jobs in progress.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Sekerob,
cheers and thanks a lot for the extra info. I'll keep this in mind for any future problems that may occur. Regards, Sean |
||
|
|
|