Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 874 times and has 4 replies Next Thread
cqexbesd
Cruncher
Joined: Oct 13, 2008
Post Count: 14
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Tasks falling asleep

I have BOINC, attached to WCG and other projects such as rosetta and SIMAP, running on 2 FreeBSD 7 boxes (both with Linux emulation enabled). Sometimes I find that a WCG task has "fallen asleep" - that is is not consuming any CPU time - even though the boinc_gui claims it should be running. If I suspend and resume the task via the GUI it comes back to life. Both machines have 2 processors and I have only ever seen it on one processor at a time. I haven't found a definite pattern but it does seem to occur after something else has used a lot of processor time (e.g. a full CPUs worth - the longer the more likely the problem is to occur). I only see this with WCG tasks. The current task is hcc. I don't recall seeing it with any other tasks but I can't bee 100% sure there.

If I use ps I see the process is marked as sleeping. truss shows nothing (unsurprisingly). Interestingly the process seems to have a zombie...

USER PID PPID %CPU %MEM RSS TT STAT STARTED TIME COMMAND
boinc 8494 767 0.0 5.6 28924 ?? IN Wed11AM 812:35.32 wcg_hcc1_img_6.03_i686-pc-linux-gnu X0000053540352200507282103.jp2
boinc 8495 8494 0.0 0.0 0 ?? ZN Wed11AM 0:46.55 <defunct>

Obviously this leaves some CPU time idle that might otherwise be going to a good cause. I searched the forums to see if this had come up before but with no success.

Any clues?
[Dec 11, 2008 3:10:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks falling asleep

The Mac version of BOINC had a problem with zombie processes. I'm not sure whether that was resolved, but at the time it was thought to be Mac-specific, not affecting other platforms.

My only conjecture is that something is interfering with the "heartbeat" mechanism, which is how BOINC tracks the health and status of its child processes.

Please will you check the error log for one of these processes after you have restarted it? You will find it in stderr.txt in one of the slot directories. Post the contents here.

Thank you.
[Dec 11, 2008 8:53:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cqexbesd
Cruncher
Joined: Oct 13, 2008
Post Count: 14
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks falling asleep

Please will you check the error log for one of these processes after you have restarted it? You will find it in stderr.txt in one of the slot directories. Post the contents here.


Yep...I have to wait for it to happen again as the process quoted above has finished now. It would expect it to happen again today or on Monday.

Thanks for your quick response!

Andrew
[Dec 12, 2008 3:26:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cqexbesd
Cruncher
Joined: Oct 13, 2008
Post Count: 14
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks falling asleep

You will find it in stderr.txt in one of the slot directories. Post the contents here.


Well I cheated and set off a big compile and the freeze happened in about 10 minutes. stderr.txt looks like:

Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1230080915.000000
Skipping: /computation_deadline
In ExtractGlcmFeatures: End of 0 iteration of outer loop.
In ExtractGlcmFeatures: End of 1 iteration of outer loop.
In ExtractGlcmFeatures: End of 2 iteration of outer loop.
In ExtractGlcmFeatures: End of 3 iteration of outer loop.
In ExtractGlcmFeatures: End of 4 iteration of outer loop.
In ExtractGlcmFeatures: End of 5 iteration of outer loop.
In ExtractGlcmFeatures: End of 6 iteration of outer loop.
In ExtractGlcmFeatures: End of 7 iteration of outer loop.
In ExtractGlcmFeatures: End of 8 iteration of outer loop.
In ExtractGlcmFeatures: End of 9 iteration of outer loop.
SIGILL: illegal instruction
Stack trace (10 frames):
[0x81d218b]
[0x8238f64]
[0xbfbfffbb]
[0x8054b42]
[0x805c411]
[0x806d0fc]
[0x807d741]
[0x807dae8]
[0x823b1da]
[0x8048131]

Exiting...
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1230080915.000000
Skipping: /computation_deadline

--

This is an hcc task as well. I wonder if its an imperfect emulation of a linux system call that only gets called when available CPU is low?

Thanks,

Andrew
[Dec 12, 2008 3:48:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks falling asleep

You could very well be right.

The remaining question is why BOINC didn't handle the crash correctly. Judging from the error log, I don't think you have the latest BOINC version. Upgrading may not help, but it's the only option I can come up with.
[Dec 13, 2008 12:28:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread