Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 15
Posts: 15   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1844 times and has 14 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Checkpoint for HPF - phase 2 and logging

Hi all,
I'm using Boinc 5.8.16 & using Boinc manager . I tried this thread but in the install under GNU/Linux there is a separate projects directory & a separate slots directory. There have been only 3 checkpoints which were shown in the messages window that too in the morning at 11:00 a.m. I exited the program in the evening in the hopes that when I would start again it would make a checkpoint or give any indication. I even suspended the task but while that sent me back by couple of percentage points there is/was no indication of when the last checkpoint happened or is happening.

Tuesday 15 May 2007 11:08:33 PM IST||Starting BOINC client version 5.8.16 for i686-pc-linux-gnu
Tuesday 15 May 2007 11:08:33 PM IST||log flags: task, file_xfer, sched_ops, checkpoint_debug
Tuesday 15 May 2007 11:08:33 PM IST||Libraries: libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3
Tuesday 15 May 2007 11:08:33 PM IST||Data directory: /home/shirish/boinc/BOINC
Tuesday 15 May 2007 11:08:33 PM IST||Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 1.80GHz [Family 15 Model 1 Stepping 2][fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm up]
Tuesday 15 May 2007 11:08:33 PM IST||Memory: 622.57 MB physical, 1.91 GB virtual
Tuesday 15 May 2007 11:08:33 PM IST||Disk: 60.50 GB total, 47.17 GB free
Tuesday 15 May 2007 11:08:33 PM IST|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 185430; location: (none); project prefs: default
Tuesday 15 May 2007 11:08:33 PM IST||General prefs: from World Community Grid (last modified 2007-05-12 20:37:21)
Tuesday 15 May 2007 11:08:33 PM IST||Host location: none
Tuesday 15 May 2007 11:08:33 PM IST||General prefs: using your defaults
Tuesday 15 May 2007 11:08:33 PM IST||Reading preferences override file
Tuesday 15 May 2007 11:08:35 PM IST|World Community Grid|Restarting task lb862_00071_14 using hpf2 version 519
Tuesday 15 May 2007 11:58:46 PM IST|World Community Grid|[task_debug] task_state=QUIT_PENDING for lb862_00071_14 from preempt
Tuesday 15 May 2007 11:58:48 PM IST|World Community Grid|[task_debug] Process for lb862_00071_14 exited
Tuesday 15 May 2007 11:58:48 PM IST|World Community Grid|[task_debug] task_state=UNINITIALIZED for lb862_00071_14 from handle_exited_app
Tuesday 15 May 2007 11:58:48 PM IST|World Community Grid|[task_debug] exit status 0
Tuesday 15 May 2007 11:58:57 PM IST||[task_debug] ACTIVE_TASK::start(): forked process: pid 15123
Tuesday 15 May 2007 11:58:57 PM IST|World Community Grid|[task_debug] task_state=EXECUTING for lb862_00071_14 from start
Tuesday 15 May 2007 11:58:57 PM IST|World Community Grid|Restarting task lb862_00071_14 using hpf2 version 519

While the slots has only 2 slots slot 0 has an entry :-

wcg_hpf2.last_pdb is this what is the last checkpoint or what is this?

This is the last checkpoint which was done at 11 a.m. local time

wcg_checkpoint_02.ckp

Now its 1:05 am Thursday local time.

Content of cc_config file

<cc_config>
<log_flags>
<checkpoint_debug>1</checkpoint_debug>
<task_debug>1</task_debug>
</log_flags>
</cc_config>

Looking forward for help, suggestions, improvements on the same.
[May 15, 2007 7:37:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

Each slot is used by a different Work Unit started / in actual progress. The Slot stores the progress files and (over) writes a new one each time a checkpoint is been saved. There are several tens of attempts packed in a HPF2 work unit each ending with a save.

Here a brief explanation from the Unofficial BOINC Wiki
http://boinc-wiki.ath.cx/index.php?title=Slots_Directory

Resuming a job does not tell you which checkpoint. it merely tells from what progress point in CPU time and estimated percentage the computation picks up again. There is no prediction of a future checkpoint. Following the message log - only if the checkpoint logging was activated - and seeing a series with e.g. 25 minutes intervals would suggest that the next one would probably also happen in 25 minutes from the last. With non-deterministic calculation it's though very possible you see material fluctuation.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at May 15, 2007 8:07:00 PM]
[May 15, 2007 7:54:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

Each slot is used by a different Work Unit started / in actual progress. The Slot stores the progress files and (over) writes a new one each time a checkpoint is been saved. There are several tens of attempts packed in a HPF2 work unit each ending with a save.

Here a brief explanation from the Unofficial BOINC Wiki
http://boinc-wiki.ath.cx/index.php?title=Slots_Directory

Resuming a job does not tell you which checkpoint. it merely tells from what progress point in CPU time and estimated percentage the computation picks up again. There is no prediction of a future checkpoint. Following the message log - only if the checkpoint logging was activated - and seeing a series with e.g. 25 minutes intervals would suggest that the next one would probably also happen in 25 minutes from the last. With non-deterministic calculation it's though very possible you see material fluctuation.


Is it possible to have a user-defined checkpoint or not? Something like make a checkpoint every 30 mins or so or make a checkpoint every 1 hr. depending on the user's needs or is it determined by the project itself?
[May 16, 2007 6:48:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

Hi shirish,

Think you need to (re) read the checkpoint saving post in the Start Here forum. We can NOT influence when a checkpoint save is made.

Sekerob
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 16, 2007 6:55:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

Hello shirish,
I wish we could checkpoint at regular intervals. But normally an application program will be using arrays with tens or even hundreds of megabytes of intermediate computations. The checkpoint code is inserted in the application program at points where (relatively) small amounts of data can capture the progress made. This varies from project to project. I believe that Genome Comparison offers many such points, so we checkpoint regularly. Just a matter of checking the clock when a potential checkpoint is reached. But most application programs are much more difficult and always checkpoint when they reach a suitable point - which may take a good deal of time to reach after the last point.

There is some discussion of this in Start Here at http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=11332

Lawrence

Added: I originally wrote that GC checkpoints at 10-minute intervals. Then I noticed that Sekerob said 20 minutes in his post. A search found a comment by Didactylos that said 10 minutes but did not find the original information that we were working from. thinking Well, I will stick with 10 minutes. not talking But I may be wrong. raised eyebrow

Added even later: 20 minute checkpoints for GC. Sekerob has proved his claim. applause

Even later: hypnotized Maybe GC checkpoints every 20 minutes in Italy but every 10 minutes elsewhere? beat up
Something peculiar is going on here. confused
----------------------------------------
[Edit 3 times, last edit by Former Member at May 16, 2007 5:51:22 PM]
[May 16, 2007 10:30:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

My information was empirical. It was true at the time, but Genome Comparison has had at least one upgrade since then, so things may have changed.
[May 16, 2007 3:52:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

I'm mum, but to say that this WISIWYGT when activating the checkpoint logging feature from v 5.8.16 tongue
2007-04-27 12:12:59 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed
2007-04-27 12:33:26 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed
2007-04-27 12:53:40 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed
2007-04-27 13:13:45 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed
2007-04-27 13:34:05 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed
2007-04-27 13:54:27 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed
2007-04-27 14:14:39 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed
2007-04-27 14:34:44 [World Community Grid] [checkpoint_debug] result 10000530-10001728_2 checkpointed

----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 16, 2007 4:03:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

laughing
That sure looks like a 20 minute check point interval to me. I guess it changed during one of the upgrades.

I'll try to remember that. tongue
[May 16, 2007 5:01:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
retsof
Former Community Advisor
USA
Joined: Jul 31, 2005
Post Count: 6824
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

It still depends on the computer. This one's doing about every 10.


5/16/2007 10:34:23 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed
5/16/2007 10:44:23 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed
5/16/2007 10:54:25 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed
5/16/2007 11:04:27 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed
5/16/2007 11:14:29 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed
5/16/2007 11:24:30 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed
5/16/2007 11:34:38 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed
5/16/2007 11:44:43 AM|World Community Grid|[checkpoint_debug] result 10000344-10001872_0 checkpointed

----------------------------------------
SUPPORT ADVISOR
Work+GPU i7 8700 12threads
School i7 4770 8threads
Default+GPU Ryzen 7 3700X 16threads
Ryzen 7 3800X 16 threads
Ryzen 9 3900X 24threads
Home i7 3540M 4threads50%
----------------------------------------
[Edit 1 times, last edit by retsof at May 16, 2007 5:22:24 PM]
[May 16, 2007 5:19:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Checkpoint for HPF - phase 2 and logging

Now I'm curious. I think to remember that the perpetual writing to disk was moved into RAM, but the price being wider checkpoints. Would it be RAM size driven? Run 1 thread on 1.5gb but seeing 20 minutes on P4 and C2D.

This is a question maybe Viktors could answer.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[May 16, 2007 5:24:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 15   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread