Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2252 times and has 8 replies Next Thread
jjhc
Cruncher
Joined: Jan 5, 2018
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Is this a runaway task, and what should I do about it (if anything)?

Hello:
My first time in this forum, apologies if I break any rules...

I've been running this software for a few weeks now.
So far so good, but there is one oddity I have noticed.

I have a task (MCM1_0139548_9816_1) which appears, to my untrained eye, to
have gone off on its own into la-la land. It has been sitting at 93.750% complete
for several days, is running at high priority, elapsed time is 96+ hours, time
remaining is 5+ hours and slowly increasing (by about 1 second every 15 seconds
elapsed), and its deadline was about 20 hours ago.

There are no error messages that I can find, although to be fair, I'm not sure
that I have looked everywhere. There's nothing obvious in the event log
(although this is tens of thousands of lines long so I may have missed
something: is there any way to search it?), and nothing in the stdoutae file,
and all the stderr files are empty.

This is on a Windows 7 machine, all patched up except for the most recent
round of fixes (the Spectre/Meltdown ones). The Results web page just says
'No Reply' for this task.

So my question is: what, if anything, should I do? Let the task run until
something closes it down automatically? Abort it myself? Something else?

If there is any more useful data I can supply, please let me know what this
might be.

Thanks for any advice. Sorry if this info is in some FAQ somewhere - I did look
but found nothing useful there.

Jonathan
[Jan 24, 2018 11:31:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pcwr
Ace Cruncher
England
Joined: Sep 17, 2005
Post Count: 10903
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

Welcome.

Have you tried to restart the BOINC service or reboot the computer?
This will make the WU continue from a known good point.

Patrick
----------------------------------------

[Jan 25, 2018 12:18:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jjhc
Cruncher
Joined: Jan 5, 2018
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

I have not tried restarting BOINC. How does one do that?

Rebooting the PC is a multi-hour task and one I tend to avoid until it
becomes necessary.
[Jan 25, 2018 5:20:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

Hi,

I have started seeing processes like the following in a ps output ...

boinc 11246 0.0 0.0 142276 28 ? SN 2017 1:25 \_ ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.36_x86_64-pc-linux-gnu -SettingsFile MCM1_0138935_1855.txt -DatabaseFile dataset-cur
atedOvarian_EarlyLate_v1.0.txt
boinc 13200 0.0 0.0 126424 24 ? SN Jan01 0:33 \_ ../../projects/www.worldcommunitygrid.org/wcgrid_fahb_bedam_7.18_x86_64-pc-linux-gnu -seed 319670693 -trickle 0 -upload 0 -wcgval 10000

They seem to be sitting there not using any CPU, one of them started last year sometime and seem to be stuck. I have noticed this on multiple machines and I have just been killing the processes when I see them.

Wonder if its the same thing as your seeing?

Its not really a problem from my point of view and it looks like boinc grabs more work when this happens anyway (so I have 14 processes running on a 12 core box when 2 are stuck).

Just thought I would say.

Matt.
[Jan 30, 2018 2:51:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

They seem to be sitting there not using any CPU, one of them started last year sometime and seem to be stuck. I have noticed this on multiple machines and I have just been killing the processes when I see them.

If they started last year, they're almost certainly expired. Just manually abort them from the BOINC client, as opposed to killing the processes.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 1 times, last edit by hchc at Feb 1, 2018 3:10:37 PM]
[Feb 1, 2018 3:09:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

I have discovered when task seem to lock up, Restarting the computer sometimes helps. Otherwise I emai worldCommunitGrid with the problem tasks
[Feb 1, 2018 9:20:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

I have more or less frequently one of those tasks, I am pretty sure I posted about this before with no reaction from the techs.
Restarting doesn't help, as those WUs usually have a (near) 0% last checkpoint. If they continue, they usually end up in an computation error.
I got used to abort any of those tasks if they run more than 25% of the average runtime on that host. Sad thing however is that I do not (easy) have access to all the hosts crunch on, hence I don't know how how many of them get held up by those "ghost tasks".
If one of those tasks "hangs", others from the same project are just processing fine, have not been able to identify any commonality among those faulty WUs...

Ralf
[Feb 6, 2018 12:20:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Chris Doran
Cruncher
Joined: Apr 28, 2007
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

It's a problem that's been around for a year or more. Unlike others, I've never seen another task start once I've got a runaway, so it's a complete block on crunching.

This is what I do: In the BOINC Manager Tasks tab right pane, left click on the runaway to highlight it, then on Suspend in the left pane. Another task will start shortly with Progress % clocking up. Wait a minute or so, then Resume the original task. It may start running immediately or may wait until the second task completes. I've read that to get this trick to work, you need to go to Tools\Computing preferences\disk and memory usage and uncheck "Leave applications in memory whilst suspended", but don't know whether this is really necessary.

You need to check for "runaway" tasks every few hours, at least daily.
[Feb 24, 2018 10:11:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
GaryWorster
Cruncher
United States
Joined: Apr 16, 2013
Post Count: 7
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Is this a runaway task, and what should I do about it (if anything)?

I've been having runaway tasks lately as well. Frequently mine will get to 100%, but then just sit there with the Elapsed Time continuing to count up and the Remaining Time blank. Right now I have 7 runaway tasks continuing to increase the Elapsed Time with no Remaining Time left, all preventing new tasks from starting.

Suspending/Resuming hasn't seemed to help in the past, but maybe I haven't given it enough chance to recycle itself. I'll try unchecking the "Leave applications in memory whilst suspended" option as well. Thanks for the tip, Chris Doran.
----------------------------------------



[Feb 27, 2018 6:21:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread