| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 6
|
|
| Author |
|
|
Highwire
Cruncher Joined: Aug 18, 2006 Post Count: 39 Status: Offline Project Badges:
|
Hi. I've been on WCG for a while, using the UD client with no problems, running mostly FAAH.
I'd used the Boinc client before on the climate change experiment and used to have issues with it (crashing tasks etc). I've a new XP machine and noticed there were new projects only for the Boinc client and with a little trepidation installed that to try it. A little while after installing I restarted the machine (proper restart) and noticed the task (FAAH) had restarted, so I gave it a reasonable while, until the task was over 5%, and restarted the machine and watched. It came up as 5% 'great' I thought .. then right in front of my eyes went to 0% again. It says 'restarting task faah..' or whatever in the log file stdoutae., but gives no more information as to why. I'm getting a horrible sense of Deja Vu here with the Boinc client, and to be honest my patience has gone PDQ with it due to my previous bad experiences. I've had a look around and can't see an obvious solution. So without me wasting the rest of my Sunday researching (sorry but this is just so frustrating!) can anyone tell my what might me causing Boinc to have wasted all those CPU cycles?! I've used the UD client for over a year on two machines and it's been nothing short of bullet proof. The UD client also ran an entire FAAH task on that XP machine fine before I switched it to Boinc (the UD client is now uninstalled). Thanks in advance, (please convert me to Boinc before I bin it again!) |
||
|
|
retsof
Former Community Advisor USA Joined: Jul 31, 2005 Post Count: 6824 Status: Offline Project Badges:
|
It still sounds like the workunit hasn't reached the first checkpoint before you turned it off.
----------------------------------------As an experiment or if the computer is not up for very long, you might try running some DDDT (Discovering Dengue Drugs Together). Checkpoints are very close together. Here are a few of my own. Time between checkpoints depends on the speed of the computer, and naturally each project could behave differently (more or fewer checkpoints). 11/25/2007 5:47:55 PM|World Community Grid|[checkpoint_debug] result dddt0201h0466_ZINC03332638-0000_00_0 checkpointed 11/25/2007 5:49:37 PM|World Community Grid|[checkpoint_debug] result dddt0201h0466_ZINC03332638-0000_00_0 checkpointed 11/25/2007 5:51:16 PM|World Community Grid|[checkpoint_debug] result dddt0201h0466_ZINC03332638-0000_00_0 checkpointed 11/25/2007 5:52:52 PM|World Community Grid|[checkpoint_debug] result dddt0201h0466_ZINC03332638-0000_00_0 checkpointed 11/25/2007 5:54:25 PM|World Community Grid|[checkpoint_debug] result dddt0201h0466_ZINC03332638-0000_00_0 checkpointed Are your checkpoints showing up in your BOINC messages? If not, here's how to get them. This FAQ shows how to add or modify checkpoint logging in a cc_config.xml file in the BOINC directory. https://secure.worldcommunitygrid.org/forums/wcg/viewthread?thread=11332 After you set it up, do an advanced/read config file.
SUPPORT ADVISOR
----------------------------------------Work+GPU i7 8700 12threads School i7 4770 8threads Default+GPU Ryzen 7 3700X 16threads Ryzen 7 3800X 16 threads Ryzen 9 3900X 24threads Home i7 3540M 4threads50% [Edit 5 times, last edit by retsof at Nov 26, 2007 12:30:54 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Highwire, UD just never told you about its problems. Sadly, crunching using UD had as many (if not more) problems than BOINC, but many of the problems were simply invisible.
Now, a terminology detail: BOINC always says "Restarting task" when what it more accurately means is "Resuming task". It's just a poor choice of words. See retsof's post for an explanation of why you may still lose small amounts of work if you turn your computer off frequently. |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7849 Status: Offline Project Badges:
|
If I remember correctly FAAH tasks have quite a number of differing checkpoint schemes. How much cpu time elapsed before you shut down?
----------------------------------------If it never reached the first checkpoint, it would go back to zero. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Highwire
Cruncher Joined: Aug 18, 2006 Post Count: 39 Status: Offline Project Badges:
|
It still sounds like the workunit hasn't reached the first checkpoint before you turned it off. As an experiment or if the computer is not up for very long, you might try running some DDDT (Discovering Dengue Drugs Together). Checkpoints are very close together. Here are a few of my own. Time between checkpoints depends on the speed of the computer, and naturally each project could behave differently (more or fewer checkpoints). Are your checkpoints showing up in your BOINC messages? If not, here's how to get them. Thanks you and the other guy for replying. No, checkpoints aren't showing. I'll see if i can enable. I've let it run longer (hitting 9.X%) and restarted just Boinc (nicely) to see what it logged as I'd read that a quick kill on shutdown can cause errors. When I restarted it showed 9.X% then fell back to 6% after a little time, so maybe it is as you say. The workunits on FAAH are reasonable size so it looks like 6% might be the first checkpoint, which seems a little large? Is it possible Boinc has less checkpoints than UD? What threw me was the fact it STARTED with the correct % showing, which looked like it 'knew where it was', then dropped to zero which appeared to be 'restarting for some reason', and it said so in the log. Maybe I've just jumped to conclusions - I remember that awful feeling when the massive climate change model restarted! I'll keep an eye on it. It would be nice if checkpoints were more finely grained i suppose, msut be a lot of cpu cycles lost when you add it all up over all projects/users/time. Especially for people who run these things as screensavers?! |
||
|
|
retsof
Former Community Advisor USA Joined: Jul 31, 2005 Post Count: 6824 Status: Offline Project Badges:
|
I'd read that a quick kill on shutdown can cause errors. Pushing a switch instead of doing a shutdown is always uncool, but otherwise that could be the Vista fast shutdown error. There's a new version of BOINC (either in test or production by now??) that solves that problem. 24/7 crunching solves most of the checkpoint concerns.
SUPPORT ADVISOR
----------------------------------------Work+GPU i7 8700 12threads School i7 4770 8threads Default+GPU Ryzen 7 3700X 16threads Ryzen 7 3800X 16 threads Ryzen 9 3900X 24threads Home i7 3540M 4threads50% [Edit 3 times, last edit by retsof at Nov 26, 2007 1:34:56 AM] |
||
|
|
|