| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 6
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I came here again, because I am a menopausal insomniac who noticed while waiting for an HCC checkpoint, that it didn't save when I thought it should've. I couldn't figure out why.
I have a couple of questions after reading "Project Checkpoint Saving - How to Minimize Progress Loss on Close/Restart" in the FAQs. With BOINC, the default minimum disk write setting in the device profile is 60 seconds. This value can be increased to (Write to disk at most every: 999 seconds), BUT, increasing this value will postpone the checkpoint saving as programmed into the science application. E.g. setting 999 seconds with Genome Comparison which saves around every 600 seconds, would delay the checkpoint save till the next i.e. around 1,200 seconds. For programs that do checkpoint saves for each segment/attempt/seed completed, the save is postponed until permitted by the profile setting i.e. on first opportunity after the exampled 999 seconds. Generally the default of 60 second should be fine for most all unless one wants to reduce disk i/o. Question #1 BOINC had many opportunities to do a disk write (48%, 49%, 50%, etc.) but didn't. My computer isn't fast enough to do a whole HCC percentage point in 60 seconds; it takes 2-7 minutes for each percantage point to occur. If I'm interpreting the above quote correctly (and it is entirely possible that I am not), a disk write should happen within the following 60 seconds after each HCC percentage point. So why are my disk writes taking 15-20 minutes to occur? Question #2 What is the rationale for BOINC's default 60-second-minimum rule instead of the option of immediate disk writing? (I say 'option' because I realize that with some projects immediate disk writing would be undesirable.) FYI, although I can't imagine why, but if it matters... • I no longer have two work units running concurrently. I am back to only one tab/work unit again. • Processor: Intel Pentium 4 • CPU: 3.06GHz • OS: Windows XP, Home Edition, SP2 • Memory: 1022.79 MB physical, 2.41 GB virtual • Disk: 223.58 GB total, 95.96 GB free • Antivirus: Norton 2008 • Firewall: ZA 6.5.731.000 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Where you are going wrong: applications don't try to checkpoint at every percentage point.
The way it works: each science application has points during the execution of a work unit where it is convenient to checkpoint. Some can checkpoint almost whenever they want to (this is rare). Such applications are set up by WCG to checkpoint about every 10 minutes. Other applications aren't so flexible. They take what opportunities they can, usually after a round of computation has completed and the amount of data that needs saving is small. So, how does this mesh with the disk write limitation? If the disk write limit says it is too soon to write to disk, then the checkpoint opportunity is missed. You have to wait for the next one. By the time the next opportunity comes, if you have now gone past the limit, the checkpoint occurs (and the disk write) and the timer is reset. And so on. So, assuming the application is trying to checkpoint less frequently than the limit, then each checkpoint opportunity will be taken. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Didactylos:
...each science application has points during the execution of a work unit where it is convenient to checkpoint... So how is the checkpointing programmed for HCC? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It checkpoints after each filter round*. HCC is a little unusual in that the first round takes about 5 minutes, but this increases and by the end of the work unit it is 30 minutes to an hour (depending on your computer's speed, of course).
* This means it just needs to save the result of the filter, not all the complex data needed half way through filtering. And when resuming from a checkpoint, it just needs to start on the next filter, instead of trying to pick up half way. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thank you.
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Start Here FAQ updated adding the HCC checkpoint observations.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
|