Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Microbiome Immunity Project Thread: Checkpoint frequency |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 7
|
Author |
|
thenlec
Cruncher Joined: Jul 18, 2016 Post Count: 6 Status: Offline Project Badges: |
Please consider increasing the checkpoint frequency for these jobs. Sometimes there is several hours of compute time between checkpoints and this is very inconvenient for "citizen scientists" trying to donate their spare compute on laptops that are stopped and started frequently. Its very frustrating to have 2 or 3 hours of compute time lost when i need to pack up my laptop for any reason. May I suggest checkpoint at least every 15 minutes?
Thank you. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Don't know what you're running this on but my logs show a much higher frequency
Starting work on structure: _0001 Finished _0001 in 1353.02 seconds. Starting work on structure: _0002 Finished _0002 in 2123.47 seconds. Starting work on structure: _0003 Finished _0003 in 1443.66 seconds. Starting work on structure: _0004 Finished _0004 in 2308.38 seconds. Starting work on structure: _0005 Finished _0005 in 1603.03 seconds. DONE :: 5 structures in 8865.5 cpu seconds For technical reasons it is most unlikely that the techs could do anything about the intervals. It's either wrting a few hundred kilobytes or the whole memory model, where already there is serious performance issue with the application, 32 bit, and hungry for certain specific CPU elements so much so that if you run more than a few concurrent the runtimes increase substantially. That's why it's recommended to limit the concurrent number allowed for this science through the app_config.xml configuration. Here allow a max of 3, but already 2 shows quite a bit of shorter runtime. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7545 Status: Offline Project Badges: |
Its very frustrating to have 2 or 3 hours of compute time lost when i need to pack up my laptop for any reason. What are the specs on your laptop, and at what percentage are you running it ? Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 2 times, last edit by Sgt.Joe at Mar 6, 2020 10:44:47 PM] |
||
|
thenlec
Cruncher Joined: Jul 18, 2016 Post Count: 6 Status: Offline Project Badges: |
Thank you for the replies. It is admittedly a low power machine, Microsoft Surface 3 (non-pro) with Intel Atom x7-Z8700 CPU @ 1.60GHz, 4GB memory, Windows 10. I run 50% of cpus at 100% utilization. I do not have a MIP task running at the moment, but the checkpoint I'm referring to is found when viewing the task properties. For example, here is am MCM task:
----------------------------------------Application Mapping Cancer Markers 7.41 Name MCM1_0160303_8294 State Running Received 3/9/2020 12:09:33 AM Report deadline 3/16/2020 12:09:32 AM Estimated computation size 48,015 GFLOPs CPU time 06:00:40 CPU time since checkpoint 00:06:59 <<====== this one Elapsed time 05:41:03 Estimated time remaining 03:04:23 Fraction done 61.285% Virtual memory size 77.37 MB Working set size 38.53 MB Directory slots/0 Process ID 2524 Progress rate 10.440% per hour Executable wcgrid_mcm1_map_7.41_windows_x86_64 The MCM tasks "CPU since last checkpoint" is usually a few minutes. The MIP tasks can often be several hours. I was hoping this might be configurable. Not a problem for other machines which I have crunching 24x7, but for a laptop that is powered up and down several times a day it can be frustrating to start over after a couple of hours of compute. Thanks again. [Edit 1 times, last edit by thenlec at Mar 9, 2020 10:52:59 PM] |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12120 Status: Offline Project Badges: |
If you find mip a problem, do not try arp. That is routinely 3 hours between checkpoints on my i7.
Mike |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12120 Status: Offline Project Badges: |
I am running 2 mip at present. One checkpointed at 50%, but the other hasn't checkpointed yet and it has reached 60%.
Any ideas? MIKE |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 858 Status: Offline Project Badges: |
I am running 2 mip at present. One checkpointed at 50%, but the other hasn't checkpointed yet and it has reached 60%. Any ideas? MIKE Mike, [I think I've got the terminology right in what follows...] MIP tasks comprise one or more structures with varying sequence lengths. The longer the sequences, the fewer structures are processed - this keeps the run times within some sort of reasonable bounds! Checkpoints are only taken between structures, so a task with only one structure will sail past 50% without doing one - your question answered! And there have been quite a few of these recently, some of them very long-running as they have very long sequences. The "worst" case I've seen recently related to tasks from MIP1_00282297 batch, which seemed to have one structure and sequence length 646 - they were taking nearly twice the average MIP1 run-time. (There have been worse; I have a note of sequence lengths of 1016 and 1026 in single structure tasks in older records, and those were taking almost three times as long as the average...) Cheers - Al. [Edited to improve(?) clarity] [Edit 1 times, last edit by alanb1951 at Mar 12, 2020 3:04:32 AM] |
||
|
|