Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 2
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 868 times and has 1 reply Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Lost crunch time

Hi,
I use BOINC to run WCG, among other projects.
I got my first WU and after a few days of crunching I noticed, that every time I turn my computer off, I lose quite a lot of 'CPU time' and 'Progress' for FA@H WU.
I decided to test it and wrote down the figures. Before closing my computer they looked like - 6h:45min / 50,9% and after restart - 4h:55min / 48%. So it's almost 2h of crunching... :-(
What's going on? With other projects I don't have this problem.
Does it have anything to do with 'write to disc frequency'? I set it to 120, but it was supposed to be 120 seconds, not 120 minutes!
Could someone please explain this to me? I find WCG science projects worth supporting, but I use my computer for short periods of time, so losing most of the work done every time I turn it off will make me stop running WCG.
Pirxx
[Oct 28, 2006 8:25:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lost crunch time

Hi Pirxx,
This is dependent on the project code and the work unit. With the old HPF project, check points were written every few minutes, so you never lost even 1% of work unit completion. FAAH is much more uncertain. The computation has to reach a certain point in the code to checkpoint, but the way the work unit behaves is highly variable. A check point is written every time the green line (in the graphic) reaches the right edge and starts over. The first time this happens, a red line is drawn. But for a few work units, this never happens. Other work units do this every few minutes. Normally this happens 3 or 4 times an hour, but there is no certainty. So you can have 0, 1, 2, ... 30+ check points per work unit. There is just no telling.

As Didactylos says, the only way to be sure to capture an image of a work unit at any arbitrary point in progress is to store hundreds of megabytes containing the whole process together with all its memory arrays from virtual memory. Which just does not seem worth the trouble.

HDC is more consistent than FAAH, but does some massive streaming I/O to store check points and uses much more disk space than FAAH. It probably could not check point at all without this, but it causes some people a lot of trouble, so they have to avoid HDC. The advance word is that the upcoming projects include some much smaller requirements, but we shall just have to wait to see what their checkpointing behavior is like.

Lawrence
[Oct 28, 2006 8:56:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread