| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 39
|
|
| Author |
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Odd but anyway, a sure 3 step it becomes:
----------------------------------------1. Remove the Leave in Memory option (use local prefs), 2. Suspend WCG in the Projects tab. 3. Wait and Resume. 4. Reverse step 1 if applicable. or Exit BOINC, and if a Service (Protected install), do it through the Activity menu Suspend option observing steps 1 and 4. This will though unload all running projects, whatever state and on a quad / dual clover making it worth considering to just kill the stuck job itself. Snooze unloading with LIM off is a surprise (does under 6.2.16 as i just confirmed).... just for a 10 minute break to do something hefty on a quad, jobs with long checkpoint space unload is costly. Same for Benchmark, it should not is my opinion, but it does, which is too stupid to write home about (e.g. A HCC job on slower brethren has an hour checkpoint space towards the end). You have no control over the periodic benchmarking unless micromanaging. Somewhere in 5 there is a behaviour change but not tested it for a long time. So, effectively, those with enough RAM should certainly get the recommendation to have the LIM on under normal crunching conditions to stop loss during benchmarking and snoozing. oh well
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Aug 12, 2008 8:38:04 AM] |
||
|
|
petehardy
Senior Cruncher USA Joined: May 4, 2007 Post Count: 318 Status: Offline Project Badges:
|
Hi All,
----------------------------------------I've got a stuck HPF2 WU:- Created: 08/28/2008 16:07:07 Name: lw953_00004 Minimum Quorum: 15 Initial Replication: 19 The large number of copies sent out for this workunit is due to the unique nature of this project. We encourage you to read the FAQs about this project for more information. Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit lw953_ 00004_ 15-- In Progress 08/29/2008 15:14:28 09/18/2008 15:14:28 0.00 0.0 / 0.0 lw953_ 00004_ 3-- Pending Validation 08/29/2008 15:02:37 08/30/2008 18:27:04 17.27 109.2 / 0.0 lw953_ 00004_ 13-- In Progress 08/29/2008 14:58:19 09/18/2008 14:58:19 0.00 0.0 / 0.0 lw953_ 00004_ 5-- Pending Validation 08/29/2008 14:43:41 08/30/2008 06:12:01 13.39 101.0 / 0.0 lw953_ 00004_ 1-- Pending Validation 08/29/2008 14:40:49 08/30/2008 04:38:04 5.14 80.9 / 0.0 lw953_ 00004_ 9-- Pending Validation 08/29/2008 14:34:58 08/30/2008 16:57:40 5.66 78.9 / 0.0 lw953_ 00004_ 7-- Pending Validation 08/29/2008 14:30:26 08/30/2008 12:40:57 6.06 83.9 / 0.0 lw953_ 00004_ 11-- Pending Validation 08/29/2008 14:27:37 08/30/2008 14:32:35 5.16 82.2 / 0.0 lw953_ 00004_ 17-- In Progress 08/29/2008 14:21:53 09/18/2008 14:21:53 0.00 0.0 / 0.0 lw953_ 00004_ 18-- Pending Validation 08/29/2008 02:24:08 08/29/2008 20:59:13 4.26 76.0 / 0.0 lw953_ 00004_ 10-- In Progress 08/29/2008 02:18:06 09/18/2008 02:18:06 0.00 0.0 / 0.0 lw953_ 00004_ 6-- Pending Validation 08/29/2008 02:15:30 08/30/2008 04:16:12 23.70 91.0 / 0.0 lw953_ 00004_ 16-- Pending Validation 08/29/2008 02:14:45 08/29/2008 19:11:05 5.13 80.0 / 0.0 lw953_ 00004_ 8-- Pending Validation 08/29/2008 01:54:49 08/29/2008 18:57:39 7.26 102.7 / 0.0 lw953_ 00004_ 12-- Pending Validation 08/29/2008 01:54:46 08/29/2008 15:56:01 6.61 91.3 / 0.0 lw953_ 00004_ 0-- In Progress 08/29/2008 01:54:38 09/18/2008 01:54:38 0.00 0.0 / 0.0 lw953_ 00004_ 4-- Pending Validation 08/29/2008 01:52:14 08/30/2008 05:34:57 9.78 63.1 / 0.0 lw953_ 00004_ 14-- Pending Validation 08/29/2008 01:47:14 08/30/2008 01:48:13 5.09 76.6 / 0.0 lw953_ 00004_ 2-- Pending Validation 08/29/2008 01:37:45 08/29/2008 22:45:17 7.77 75.5 / 0.0 It's used 36hrs cpu and is stuck at 37%. I tried suspend/resume on the project and the task - still stuck. I've copied the slots directory and the messages. Can I get any more documentation? ![]() "Patience is a virtue", I can't wait to learn it! [Edit 1 times, last edit by petehardy at Aug 30, 2008 10:09:47 PM] |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
If not yet done you can try stopping and restarting Boinc.
----------------------------------------We used to think that suspend/resume project should do the same, however after recently making a few tests about the "Leave in Memory (LIM)" option it is clear that if it is set to ON a suspend/resume of either the task or the project will not force a restart at the last checkpoint. Stopping/restarting Boinc will. Please let us know what your LIM setting is (my bet is ON), and if you finally got your task to restart and jump over the 37% barrier. Good luck. Jean. |
||
|
|
petehardy
Senior Cruncher USA Joined: May 4, 2007 Post Count: 318 Status: Offline Project Badges:
|
LIM was ON, and I run Boinc as a service(XP).
----------------------------------------Here's what I did: 1. Suspend project 2. Right click tray icon->Exit 3. Start Boinc System Tray 4. Start Boinc Manager 5. Job still using cpu - no change in progress. 6. Set LIM to OFF 7. Suspend project 8. Wait a few seconds then Resume project. The last action kicked it off. CPU time went down to around 3hrs. and the job started to progress normally. A benchmark ran at 3.44 Eastern time, which is 36hrs after the job started and about 1 hr before I noticed the problem(according to post time, the forum is on Central time I think). Sorry about the rambling. Pete ![]() "Patience is a virtue", I can't wait to learn it! |
||
|
|
petehardy
Senior Cruncher USA Joined: May 4, 2007 Post Count: 318 Status: Offline Project Badges:
|
Sekerob,
----------------------------------------If only the feature like in BOINCview with colour coding and a progress alert pop-up, there'd be no 22 hours lost time on semi attended clients. This job(see earlier posts), has 20 days to deadline. Why can't Boinc see that many CPU hours are being used without any progress? I monitor BoincView regularly, it tells you the status of all your computers, network and whether you've got some sort of resource hogging program running, for example the new Microsoft Search. I'm surprised that I didn't see this one for 36 hours. Pete ![]() "Patience is a virtue", I can't wait to learn it! |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
LIM was ON, and I run Boinc as a service(XP). I forgot about that case, I am glad that you thought of setting LIM OFF instead of rebooting! A benchmark ran at 3.44 Eastern time, which is 36hrs after the job started and about 1 hr before I noticed the problem(according to post time, the forum is on Central time I think). Sorry about the rambling. No problem. We have seen worse! Regarding the forum it is on the time you or the default install have set it. You may change it by selecting "My Forum Profile" in the forum tool bar, then click "Change my information" at the top. Then you will find the Time Zone setting that you can change as you like. Mine is on GMT for convenience although France is currently GMT+2. Cheers. Jean. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The LIM setting part was added to the FAQ last time we discussed this finding (and once again, the case is proven).
----------------------------------------petehardy, There are projects that do not record progress time at all except when checkpointing. There's one that has 10 segments, so 10 times the progress time jumps and 10 times 10% is added. BOINC has no kill feature because it's software the just manages and records but has no control of what happens inside the science application. When it uses CPU time it's considered to be alive. The benchmark info is of interest. Another person sees HPF2 hanging after, but not always. Force benchmark to happen at a convenient time of the day and it will always run that time, every 5th day. Curious to know what happens.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The other thing to point out here is that exiting BOINC Manager does not stop the BOINC service. The notification area icon is part of BOINC Manager - it is not the same as the service, and it has nothing to do with boinctray.exe. Confusing, yes?
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The "Stuck" problem goes back way before boinctray.exe was introduced, so doubt very much this variable has anything to do with it.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Absolutely nothing.
|
||
|
|
|