Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 12
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3845 times and has 11 replies Next Thread
Recluce
Cruncher
Joined: Nov 22, 2009
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
No Checkpointing?

I had a system crash. While the other projects recovered basically where they left off, the Genome Mysteries project simply threw away 21 hours of CPU time per task and started from zero.

That and the frequent disk writes made me opt out of the project for now. The geniuses that wrote the UGM application do realize that people have SSDs nowadays, do they?
[Nov 19, 2014 4:13:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

If the recluse came out to snark, yes the geniuses have announced an alpha is in progress, soon expected to go to beta, to sync the writing blips to the checkpoint 'at most' setting. Was not on account of your ssd though, it is giving a performance hit.
[Nov 19, 2014 4:46:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

Ps, what nowadays ssd fitted node takes 21 hours and not checkpoint for ugm? There could be something wrong with your host, considering the project mean is 3.5 hours runtime per result. Good you opted out as wasting time on a project a device cannot handle is not a good thing.
[Nov 19, 2014 4:53:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Recluce
Cruncher
Joined: Nov 22, 2009
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

And here I thought that a project like Boinc attracts supportive, POSITIVE characters. Obviously it attracts quite the opposite type as well. Lavaflow, go hide under a rock.

So confirmed: UGM is great to kill SSDs early and does not checkpoint. The latter would explain the runtimes.
[Nov 20, 2014 6:30:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Eric_Kaiser
Veteran Cruncher
Germany (Hessen)
Joined: May 7, 2013
Post Count: 1047
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

What kind of computer are we talking about that it takes more than 21 hrs to complete an ugm task?
I mean I have two amd 5350 kabini devices running at 2 GHz. These are cheap low power cpu. But even with those slow cpu runtime for ugm is between 5 and 8 hrs per task.
I bet my android devices wouldn't be much slower if there would be ugm for android.
So how could your devices need more than 21 hrs?
----------------------------------------

[Nov 20, 2014 6:53:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

And here I thought that a project like Boinc attracts supportive, POSITIVE characters. Obviously it attracts quite the opposite type as well. Lavaflow, go hide under a rock.

So confirmed: UGM is great to kill SSDs early and does not checkpoint. The latter would explain the runtimes.

Yup, your kick off with 'The geniuses that wrote the UGM' straight off really painted you right under that corner stone. Thought to just join you for a little from under mine laughing

Eric, you ask the same question, but other than complaining and there being a ssd, no info from recluse. Have done near 2000 ugm now, not a single hung, mean about same as the project average on simple hd storage.
----------------------------------------
[Edit 2 times, last edit by Former Member at Nov 20, 2014 8:05:40 AM]
[Nov 20, 2014 8:03:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

Come on children, put the pea-shooters away ...

First off, UGM does checkpoint, very regularly, every 10 minutes. I've never set WTD to longer than 10 minutes so I don't know for sure that it respects that -- mine's currently set to 3 minutes -- but I have no reason to think that it doesn't. And, as has already been said, the issue with mini-writes every few seconds is being addressed and a fix is in alpha test right now.

What we do know about UGM is that it uses a more modern instruction that's not available on very old computers, so you might be hitting that. But I have a P4 that runs these OK and even the largest units have finished in around 20 hours CPU.

I wonder if you are confusing wall-time with CPU time and your processor was "stuck"? What happened after the re-boot, exactly?

If you'd like the community to help you please post the start-up lines from the BOINC message log along with anything else that might help. If this unit is long gone, then I suggest you wait for the new code and then try this project again to re-evaluate its behaviour.

People will be more inclined to help you if you are just a little more polite. Don't vent your spleen until you have some proper ammunition, backed up with data, and not just supposition.
[Nov 20, 2014 10:28:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
l_mckeon
Senior Cruncher
Joined: Oct 20, 2007
Post Count: 439
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

If BoincTasks is to be believed, it checkpoints every 60 seconds. MCM checkpoints every ten minutes.

Last weekend I ran Privazer prior to doing a backup, and I found on restarting that any previous work had gone and running tasks needed to restart.

Apparently UGM stores its work in progress somewhere that is susceptible to damage. Not a big deal in my opinion.
[Nov 20, 2014 11:40:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

The 'checkpoint writing' properly follows the 'write to disk at most' setting, default 60 seconds. Ok back then with single and dual cores, not good when running 4-8 threads and more. Then it becomes one being written every 8-15 seconds. This is why having this setting at 300-600 seconds is better, less writes, myself having wtdam at 1000 seconds. Not worried about incidental losses on boot or client restart.

Since the checkpoint writes are like clockwork at system start/restart, this creates a bottleneck when 4-8 or more write their checkpoint simultaneous until the concurrent ugm tasks over time desync their writes, which is latest when they finish and the next start.

Why the op poster had no checkpointing, pass, maybe a device that has no sse2, p3 cpu level, then computing would fail for ugm. It's one of those 'very rare' reports, hardly reproducible in test labs. Knowing the exact computing environment would help for starters. But if this single fail on over 6.3 million completed results could kick an investigation is not my expectation.

Btw, there is in some recent versions of boinctasks a bug in the checkpoint counter. Now running 1.67 which appears to increment them properly.
----------------------------------------
[Edit 1 times, last edit by Former Member at Nov 21, 2014 8:11:33 AM]
[Nov 21, 2014 8:05:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
l_mckeon
Senior Cruncher
Joined: Oct 20, 2007
Post Count: 439
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: No Checkpointing?

Thanks for the info. I've noticed the BoincTasks counter bug but I'll wait until 1.67 comes out of beta.
[Nov 22, 2014 1:45:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread