Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 4157 times and has 7 replies Next Thread
Combat Marmot
Cruncher
Joined: Jan 17, 2007
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Tasks - 00287964 - Reverting to 20% and taking significantly longer

Hello All,

Has anyone noticed strange behaviour from some of the Microbiome Tasks?

On one PC, the maximum run time for any Microbiome WU was just short of two hours. Tasks such as the 00287964 series do this:

  • Proceed as normal. Percentage complete increases by <0.01% per second
  • Get to a certain point (above 75%) then revert to 20%
  • Increase at a rate of 20% every 20 minutes or so (i.e. it is a step change of 20%)
  • Get stuck about 60%
  • Finally complete after about 3 hours.


I aborted 8 tasks like this, thinking it was a fault before one completed. Also, the checkpoints for these tasks don't seem to save correctly. A reboot will set the progress of all of them to zero.

All other WCG projects are going as normal (e.g. mapping cancer markers).

I've checked RAM and allocated disc space - both have capacity.

Any ideas what's going on?
----------------------------------------
[Edit 1 times, last edit by Combat Marmot at Apr 5, 2020 5:42:51 PM]
[Apr 5, 2020 5:39:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
yoerik
Senior Cruncher
Canada
Joined: Mar 24, 2020
Post Count: 413
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks - 00287964 - Reverting to 20% and taking significantly longer

The same thing happened to me - it would stop at 60% for hours. Had to abort in order to meet deadlines for other projects - and I turned MIP off via preferences for now.

Hopefully this gets fixed.
----------------------------------------

[Apr 5, 2020 6:31:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Combat Marmot
Cruncher
Joined: Jan 17, 2007
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks - 00287964 - Reverting to 20% and taking significantly longer

Thanks for the information, I'm glad I'm not alone in noticing it.

7 of the tasks finally completed, taking times between 4:51 and 5:12!
[Apr 6, 2020 11:20:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks - 00287964 - Reverting to 20% and taking significantly longer

Exclude your BOINC data directory from Virus scanning. Just in case the AV keeps inspecting the models and sees processes/strings it is not sure about.
----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 6, 2020 11:42:29 AM]
[Apr 6, 2020 11:41:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JSYKES
Senior Cruncher
Joined: Apr 28, 2007
Post Count: 200
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks - 00287964 - Reverting to 20% and taking significantly longer

I'm relieved to see this thread - I thought I'd got a problem with the stability my PC!!! I have now checked and there's chronic inconsistency with check pointing - some WU's will checkpoint after (say) 9 mins /10% - while others will get to 60% with still no checkpoint and then suddenly, after anything between 45 and 90 mins decide to restart - losing hours of work across the various cores.....I'm at a loss to know what to do - anyone (including the WCG Techs??) got any ideas?? Has anything changed in the format of the WU's?
----------------------------------------

[Apr 11, 2020 11:36:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks - 00287964 - Reverting to 20% and taking significantly longer

MIP is sized based on the estimated duration of a structure. If 1 takes 2 hours, it's packed in a task on it's own, if others take 10 minutes, 12 are packed, or however many to get to an average runtime total of say 2 hours average. That's why you see these wide spreads of checkpoint times.

A task that did not have a checkpoint yet will be kept in memory even when paused. Tasks that had a checkpoint are paused if scheduling decides to reorder priority. By ticking 'Leave non-GPU tasks in memory while suspended', you prevent the regressions, lest you boot.

Tasks that are just jumping back while running are best aborted. Let another machine have a go at it.

Sheet happens, albeit I don't get this at all for any science at WCG. The only incidental recurrence is the download fail on FAHB and one other science. Bandwidth lost but no crunching time.
----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 11, 2020 12:07:12 PM]
[Apr 11, 2020 12:06:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JSYKES
Senior Cruncher
Joined: Apr 28, 2007
Post Count: 200
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Tasks - 00287964 - Reverting to 20% and taking significantly longer

Also had a similar problem with a few of WU# 288545 too - getting to 60% then restarting, no checkpoints and a waste of time - gave it a few cycles and then aborted them....
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by JSYKES at Apr 15, 2020 3:01:24 PM]
[Apr 15, 2020 2:59:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Tasks - 00287964 - Reverting to 20% and taking significantly longer

It seems they eventually finish. Are they valid? Has anyone looked at the task output to see if they may have encountered some kind of error? Maybe a new kind of target or a new research approach.
[Apr 15, 2020 3:18:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread