| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 26
|
|
| Author |
|
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3315 Status: Offline Project Badges:
|
Okay, looks like I got a workunit behaving like this on my Windows 10 AMD 2500U laptop. It should have had about 2 hours of CPU time like the other workunits that got downloaded at the same time and are running properly. (SCC workunits)
----------------------------------------Looking at stderr, it looks like it never checkpointed and always restarted from the beginning. "Starting work on structure: _0001" shows up 5 times, all with a time stamp and a bunch of other lines like this: [2020- 4- 1 22:34:32:] :: BOINC:: Initializing ... ok. [2020- 4- 1 22:34:32:] :: BOINC :: boinc_init() INFO: result number = 0 BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: projects/www.worldcommunitygrid.org/wcgrid_mip1_rosetta_7.16_windows_intelx86 -in::file::zip MIP1_databasev2.zip @./MIP1_00287409.flags -out::file::silent result_silent.out -run:jran 501864742 -nstruct 1 -out::level 100 -run::no_scorefile true Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/www.worldcommunitygrid.org/mip1.MIP1_databasev2.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... set_shared_memory_fully_initialized ... abrelax ... abrelax.run Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting work on structure: _0001 [2020- 4- 1 23:13:38:] :: BOINC:: Initializing ... ok. [2020- 4- 1 23:13:38:] :: BOINC :: boinc_init() INFO: result number = 0 ... On the Ryzen 1400 running 20.04 Xubuntu, I've seen the MIP tasks saving checkpoints. What could be causing this? This task has now run for 20 CPU minutes without any checkpoints. Edit: Almost 40 CPU minutes without a checkpoint, progress at 40%, no change in the stderr file. I downloaded 2 other MIP just to see if they checkpointed, no checkpoint after 10 CPU minutes although it's probably early. ![]() - AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W - AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W - AMD Ryzen 7 7730U 8C/16T 3.0 GHz [Edit 1 times, last edit by Falconet at Apr 4, 2020 11:11:13 PM] |
||
|
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 328 Status: Offline Project Badges:
|
In the stderr output the command contains '-nstruct 1'. I believe this means there is only one structure in the work unit and MIP only checkpoints after completing work on a structure. Therefore it will only checkpoint once at the end.
|
||
|
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3315 Status: Offline Project Badges:
|
In the stderr output the command contains '-nstruct 1'. I believe this means there is only one structure in the work unit and MIP only checkpoints after completing work on a structure. Therefore it will only checkpoint once at the end. Well, I guess that explains it then. A bummer if you don't keep your device turned on for the whole duration of the task. Thanks ![]() - AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W - AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W - AMD Ryzen 7 7730U 8C/16T 3.0 GHz [Edit 1 times, last edit by Falconet at Apr 4, 2020 11:23:11 PM] |
||
|
|
niyar
Cruncher Joined: Mar 18, 2015 Post Count: 12 Status: Offline Project Badges:
|
Thanks a lot, guys. So is there a way to ask the programmers to ensure that more frequent checkpoints are built-in to the work tasks? It seems like this would be the easiest and best solution.
|
||
|
|
TOMinAZ
Cruncher United States Joined: Feb 11, 2007 Post Count: 40 Status: Offline Project Badges:
|
I was seeing the same thing with the Microbiome project, for about a week. I looked over some of the solutions on this thread, but then I switched to Smash Childhood Cancer, and now it does checkpoints and saves progress on disk like it's supposed to. Unless I see it happen again, I assume it's something in the Microbiome project.
|
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
The 'problem' with mip is that the checkpoints are unpredictable. Sometimes there are many. Other time, there are few, if any.
If you shut down in any project you lose any work processed since the last checkpoint. Hibernating avoids the problem. Otherwise, select the unit under tasks and click on the properties tab.. There it will tell you the total CPU time and the time since the last checkpoint. If they are the same, then shutting down would lose you the whole time. Basically, I hibernate my laptop if I want to power down. My tower is 24/7. Mike |
||
|
|
|