Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Microbiome Immunity Project Thread: Tasks - 00287964 - Reverting to 20% and taking significantly longer |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 8
|
Author |
|
Combat Marmot
Cruncher Joined: Jan 17, 2007 Post Count: 2 Status: Offline Project Badges: |
Hello All,
----------------------------------------Has anyone noticed strange behaviour from some of the Microbiome Tasks? On one PC, the maximum run time for any Microbiome WU was just short of two hours. Tasks such as the 00287964 series do this:
I aborted 8 tasks like this, thinking it was a fault before one completed. Also, the checkpoints for these tasks don't seem to save correctly. A reboot will set the progress of all of them to zero. All other WCG projects are going as normal (e.g. mapping cancer markers). I've checked RAM and allocated disc space - both have capacity. Any ideas what's going on? [Edit 1 times, last edit by Combat Marmot at Apr 5, 2020 5:42:51 PM] |
||
|
yoerik
Senior Cruncher Canada Joined: Mar 24, 2020 Post Count: 413 Status: Offline Project Badges: |
The same thing happened to me - it would stop at 60% for hours. Had to abort in order to meet deadlines for other projects - and I turned MIP off via preferences for now.
----------------------------------------Hopefully this gets fixed. |
||
|
Combat Marmot
Cruncher Joined: Jan 17, 2007 Post Count: 2 Status: Offline Project Badges: |
Thanks for the information, I'm glad I'm not alone in noticing it.
7 of the tasks finally completed, taking times between 4:51 and 5:12! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Exclude your BOINC data directory from Virus scanning. Just in case the AV keeps inspecting the models and sees processes/strings it is not sure about.
----------------------------------------[Edit 1 times, last edit by Former Member at Apr 6, 2020 11:42:29 AM] |
||
|
JSYKES
Senior Cruncher Joined: Apr 28, 2007 Post Count: 200 Status: Offline Project Badges: |
I'm relieved to see this thread - I thought I'd got a problem with the stability my PC!!! I have now checked and there's chronic inconsistency with check pointing - some WU's will checkpoint after (say) 9 mins /10% - while others will get to 60% with still no checkpoint and then suddenly, after anything between 45 and 90 mins decide to restart - losing hours of work across the various cores.....I'm at a loss to know what to do - anyone (including the WCG Techs??) got any ideas?? Has anything changed in the format of the WU's?
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
MIP is sized based on the estimated duration of a structure. If 1 takes 2 hours, it's packed in a task on it's own, if others take 10 minutes, 12 are packed, or however many to get to an average runtime total of say 2 hours average. That's why you see these wide spreads of checkpoint times.
----------------------------------------A task that did not have a checkpoint yet will be kept in memory even when paused. Tasks that had a checkpoint are paused if scheduling decides to reorder priority. By ticking 'Leave non-GPU tasks in memory while suspended', you prevent the regressions, lest you boot. Tasks that are just jumping back while running are best aborted. Let another machine have a go at it. Sheet happens, albeit I don't get this at all for any science at WCG. The only incidental recurrence is the download fail on FAHB and one other science. Bandwidth lost but no crunching time. [Edit 1 times, last edit by Former Member at Apr 11, 2020 12:07:12 PM] |
||
|
JSYKES
Senior Cruncher Joined: Apr 28, 2007 Post Count: 200 Status: Offline Project Badges: |
Also had a similar problem with a few of WU# 288545 too - getting to 60% then restarting, no checkpoints and a waste of time - gave it a few cycles and then aborted them....
----------------------------------------[Edit 1 times, last edit by JSYKES at Apr 15, 2020 3:01:24 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It seems they eventually finish. Are they valid? Has anyone looked at the task output to see if they may have encountered some kind of error? Maybe a new kind of target or a new research approach.
|
||
|
|