| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 22
|
|
| Author |
|
|
Muckebadscher
Cruncher Joined: Jun 13, 2006 Post Count: 8 Status: Offline Project Badges:
|
I'm using to systems at the moment:
PC 1: WIN7 8 core CPU + 1GPU = 9WU in parallel not only WCG PC2: WIN XP 4 core CPU = 4 WU in parallel My settings are to save to disk every 120sec. Only with the clean energy project I have this problem that it doesn't save correct checkpoints. When normally one WU takes 7h the checkpoints are only after 1h or 3h (approx 35%)...nothing more. That means when I shut down the PC or boinc manager and had only 1h (approx 85%) left after restart of boinc manager I went back to 35% and 5-7h left. Because that problem I have on both systems only on clean energy project and even after update to newest version of boinc manager I assume the problem at the WUs of clean energy project. Now I'm a little bit surprised that you do not know this kind of problem... I tried your proposed without success. Do you have any further ideas? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There's as eluded 16 checkpoints in a CEP2. On my 8 core I7 lappie, the first and second are within several to 10-15 minutes, then the 3rd job (#2), takes 3-5 hours. BOINCTasks (a 3rd party BOINC Manager) will actually count down the checkpoints, but you can see in the stderr.txt file of the job slot, typically on W7 in C:\ProgramData\BOINC\Slots\#, where the progress of a task really is. The stderr.txt file is the same as what's uploaded as Result Log which you can visit on the Result Status page. A sample on a handstarted CEP2 task, now working on the 3rd job, is this:
INFO: No state to restore. Start from the beginning. [10:18:23] Number of jobs = 16 [10:18:23] Starting job 0,CPU time has been restored to 0.000000. [10:23:12] Finished Job #0 [10:23:12] Starting job 1,CPU time has been restored to 277.993782. [10:38:10] Finished Job #1 [10:38:10] Starting job 2,CPU time has been restored to 1144.672938. (First job is numbered 0, and any job takes the output from the previous job as input) Note that a CEP2 task assumes 12 hours run time on systems that never finish them completely [12 hours is the cut-off]. The progress percent is then based on the 12 hours assumption. Setting write interval has no effect on CEP2... 2 minutes or 2 days. It writes the output to disk only at end of a job, else way too big, in the gigabytes size which would be disastrous if you were using the computer as heavy disk IO really impairs the user. If a whole job takes 7 hours, and there are no write points but the once observed, the question is, what then is logged in the stderr.txt file after long run time and prior to hibernating? When resumed from hibernating, there should not be an interruption logged in the the stderr.txt file. If there is, what does it look like? When restarted, the log shows from which point and what seconds CPU time that resume is taking place. Of old, shutting down too fast, not hibernating or standby, could cause damage to a task, hence when this is observed, stopping BOINC first, waiting a minute, to let the writes to disk complete, then shutting down would prevent that. Typically this happened on Vista. It was fixed with later BOINC clients. |
||
|
|
Muckebadscher
Cruncher Joined: Jun 13, 2006 Post Count: 8 Status: Offline Project Badges:
|
Thanks for your feedback...I will test stopping boinc manager manual before shutting down the PC and will have a look to the described file. Could be that CEP2 needs much more time for saving the data as the CEP2 WUs are the biggest I'm using with approx. 25MB. I will post my feedback after having tested all possibilities with CEP2.
|
||
|
|
Muckebadscher
Cruncher Joined: Jun 13, 2006 Post Count: 8 Status: Offline Project Badges:
|
I have tested on two CEP WUs to stop boinc manager before stopping win 7 now:
That works. I recognised that when CEP WUs are running on boinc manager the manager needs approx. 10-15 sec to shut down. Obversly windows is not giving boinc manager that time when win itself wants to go off. Thanks for your help |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There's a registry "hack" to force Windows to wait for BOINC to close [See Vista FAQ, item 5 in the Start Here read-only forum: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=12856 ]. This "hack" was intended to overcome another crash error, but may work.
Also, when coming out of hibernation it could happen the system is too busy. With CEP2 and LAIM on [of course], you could suspend computing manually in the Activity menu [snooze wont work as it counts on, even when computer is asleep]. When it's started again, go to the BOINC Manager and manually resume computing, avoiding startup competition for CPU cycles and disk I/O. Certainly much better as exiting the client always looses progress. But, I've done hibernating so many times on W7-32/64 with v7 clients, never to see any task returning to last checkpoint, let alone crashing back to the beginning. |
||
|
|
Muckebadscher
Cruncher Joined: Jun 13, 2006 Post Count: 8 Status: Offline Project Badges:
|
I think I have this kind of problem because WIN is running on a SSD, so very fast, and boinc "only" on a normal SCADA HDD.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, don't let me stop you from trying the registry mod as noted in the Vista FAQ, specifically, this key which means 60 seconds kill delay:
----------------------------------------[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control] "WaitToKillServiceTimeout"="60000" Since my Windows and BOINC are on the same physical drive, but on separate partitions to minimize fragmenting, would not notice that. I did look with regedit.exe in my W7-64 registry and presently it's set at what I suspect to be default, 12000 Edit: But, it's a function called at shutdown, not at hibernation [when a state snapshot of memory is taken, then saved to the hiberfil.sys file in root] [Edit 1 times, last edit by Former Member at Feb 28, 2013 1:25:30 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To add, an interesting key found in the same place as WaitToKillServiceTimeout, is PreshutdownOrder. A google popped out this MS support article: http://support.microsoft.com/kb/146092 (but no word of PreshutdownOrder) and this http://serverfault.com/questions/34427/windows-service-dependencies (for the adventurous and inquisitive)
----------------------------------------edit: more expansion on PreshutdownOrder in this article that is linked in the one above: http://blogs.technet.com/b/askperf/archive/20...n-and-crash-handling.aspx [Edit 1 times, last edit by Former Member at Feb 28, 2013 2:13:56 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Had problems with hibernation. When I hibernated my computer last night my we was at 24% now it's back to 5%. Feels like I'm wasting my time running these units as I don't have a problem with losing information on any of the other projects. I don't like leaving my laptop on all day and it only had 7 mins battery left when I decided to hibernate the system last night. I usually let the computer sleep when I'm not using the laptop.
|
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1407 Status: Recently Active Project Badges:
|
I loaded some CEP2's on my wife's laptop. By accident, I think, because that machine is shutdown every evening and sometimes even down for more than a day.
----------------------------------------Because of the known long intervals between the checkpoints (sometimes even 6 hours), I decided to test the sleep and hibernate modus for this in stead of letting the laptop run for the night. Both modi were working fine, but before switching to sleep or hibernate, I suspended all tasks first with 'leave application in memory on'. After restart I resumed the CEP2-task and it restarted exactly from the point where I left it without any loss. [Edit 2 times, last edit by Crystal Pellet at Mar 9, 2013 10:13:10 AM] |
||
|
|
|