Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1460 times and has 6 replies Next Thread
thenlec
Cruncher
Joined: Jul 18, 2016
Post Count: 6
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Checkpoint frequency

Please consider increasing the checkpoint frequency for these jobs. Sometimes there is several hours of compute time between checkpoints and this is very inconvenient for "citizen scientists" trying to donate their spare compute on laptops that are stopped and started frequently. Its very frustrating to have 2 or 3 hours of compute time lost when i need to pack up my laptop for any reason. May I suggest checkpoint at least every 15 minutes?
Thank you.
[Mar 6, 2020 8:27:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint frequency

Don't know what you're running this on but my logs show a much higher frequency

Starting work on structure: _0001
Finished _0001 in 1353.02 seconds.
Starting work on structure: _0002
Finished _0002 in 2123.47 seconds.
Starting work on structure: _0003
Finished _0003 in 1443.66 seconds.
Starting work on structure: _0004
Finished _0004 in 2308.38 seconds.
Starting work on structure: _0005
Finished _0005 in 1603.03 seconds.
DONE :: 5 structures in 8865.5 cpu seconds

For technical reasons it is most unlikely that the techs could do anything about the intervals. It's either wrting a few hundred kilobytes or the whole memory model, where already there is serious performance issue with the application, 32 bit, and hungry for certain specific CPU elements so much so that if you run more than a few concurrent the runtimes increase substantially. That's why it's recommended to limit the concurrent number allowed for this science through the app_config.xml configuration. Here allow a max of 3, but already 2 shows quite a bit of shorter runtime.
[Mar 6, 2020 8:54:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7545
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Checkpoint frequency

Its very frustrating to have 2 or 3 hours of compute time lost when i need to pack up my laptop for any reason.

What are the specs on your laptop, and at what percentage are you running it ?
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 2 times, last edit by Sgt.Joe at Mar 6, 2020 10:44:47 PM]
[Mar 6, 2020 10:43:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
thenlec
Cruncher
Joined: Jul 18, 2016
Post Count: 6
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Checkpoint frequency

Thank you for the replies. It is admittedly a low power machine, Microsoft Surface 3 (non-pro) with Intel Atom x7-Z8700 CPU @ 1.60GHz, 4GB memory, Windows 10. I run 50% of cpus at 100% utilization. I do not have a MIP task running at the moment, but the checkpoint I'm referring to is found when viewing the task properties. For example, here is am MCM task:

Application Mapping Cancer Markers 7.41
Name MCM1_0160303_8294
State Running
Received 3/9/2020 12:09:33 AM
Report deadline 3/16/2020 12:09:32 AM
Estimated computation size 48,015 GFLOPs
CPU time 06:00:40
CPU time since checkpoint 00:06:59 <<====== this one
Elapsed time 05:41:03
Estimated time remaining 03:04:23
Fraction done 61.285%
Virtual memory size 77.37 MB
Working set size 38.53 MB
Directory slots/0
Process ID 2524
Progress rate 10.440% per hour
Executable wcgrid_mcm1_map_7.41_windows_x86_64

The MCM tasks "CPU since last checkpoint" is usually a few minutes. The MIP tasks can often be several hours. I was hoping this might be configurable. Not a problem for other machines which I have crunching 24x7, but for a laptop that is powered up and down several times a day it can be frustrating to start over after a couple of hours of compute.
Thanks again.
----------------------------------------
[Edit 1 times, last edit by thenlec at Mar 9, 2020 10:52:59 PM]
[Mar 9, 2020 10:50:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12120
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Checkpoint frequency

If you find mip a problem, do not try arp. That is routinely 3 hours between checkpoints on my i7.

Mike
[Mar 10, 2020 12:04:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12120
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Checkpoint frequency

I am running 2 mip at present. One checkpointed at 50%, but the other hasn't checkpointed yet and it has reached 60%.

Any ideas?

MIKE
[Mar 11, 2020 8:31:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 858
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Checkpoint frequency

I am running 2 mip at present. One checkpointed at 50%, but the other hasn't checkpointed yet and it has reached 60%.

Any ideas?

MIKE

Mike,

[I think I've got the terminology right in what follows...]

MIP tasks comprise one or more structures with varying sequence lengths. The longer the sequences, the fewer structures are processed - this keeps the run times within some sort of reasonable bounds!

Checkpoints are only taken between structures, so a task with only one structure will sail past 50% without doing one - your question answered! And there have been quite a few of these recently, some of them very long-running as they have very long sequences.

The "worst" case I've seen recently related to tasks from MIP1_00282297 batch, which seemed to have one structure and sequence length 646 - they were taking nearly twice the average MIP1 run-time. (There have been worse; I have a note of sequence lengths of 1016 and 1026 in single structure tasks in older records, and those were taking almost three times as long as the average...)

Cheers - Al.
[Edited to improve(?) clarity]
----------------------------------------
[Edit 1 times, last edit by alanb1951 at Mar 12, 2020 3:04:32 AM]
[Mar 12, 2020 2:56:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread