Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 23
Posts: 23   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1926 times and has 22 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Time between saves or checkpoints

Is anyone else seeing a big increase in the time between saves or checkpoints for this project? I was seeing times of a few minutes to maybe 20 minutes before today. Now I'm seeing times of a few to several hours (or as much as 50% completed).
[Jul 11, 2006 12:27:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

It has been advised that the HPF2 are very demanding, such that a single segment or chunk will take a very long time.....hence the restart (check)points lying far apart in time. Imagine a HPF2 with the number of segments of a HPF1, then going bust in the 11th hour......big waste. The startup problems proof the foresight was right, not to cram too many in one.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Jul 11, 2006 6:15:13 AM]
[Jul 11, 2006 6:13:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Wikkel
Cruncher
Joined: Dec 28, 2005
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

same here...
I have three devices running, they are all processing a FAAH chunk that runs for less than 2 hours up to a progress of 50%. Than the progress falls back to 48%, runs op to 50%, falls back to 48% etc..
When a device is power cycled the chunk starts all over again at 0%.
[Jul 11, 2006 7:24:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

FAAH are entirely differently and smaller segmented ..... 1 or 2%. Progress should be seen steadily with time. In the UD Agent ** progress can be seen under the 'i' button in steps of 0.1% (the graph screen).

the 48, 50, 48, 50 etc suggests its attempting to complete a segment of 2% or thereabout, but not succeeding. Never seen this phenomena before and certainly not heard of anyone showing this on 3 machines of 1 user at the same time.....FAAH is generally very steady.

If it does this for extended time, i'd abort by killing the UD_7xxxxxx.exe process in Taskmanager and fetch a new WU (Work Unit), which it should do automatically. But, lets see if a WCG technician picks up on this first.

** Guessing you have UD Agent, not BOINC. Power cycling runs risk of damaging WU....best is to properly exit UD first.

EDIT 12.55pm 7.11.06: I'M SEEING THE SAME NOW BY COINCIDENCE ON THE 'i' SCREEN OF A FAAH ON UD. progresses slowly from 47.7 to 50.0%, than skips back to 47.7. Now cycled thru that 4 times in the last 2 hours......unless getting a quick reply, i'm sending this one to eternal hunting grounds, in the prescribed manner.DEVICE IS 160465, FAAH v 4.0.3.4, WU download approx. 01:21am UTC
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 7 times, last edit by Sekerob at Jul 11, 2006 11:25:48 AM]
[Jul 11, 2006 7:57:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
confused Re: Time between saves or checkpoints

The graph falling back to a lower percent completed is explained here: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=6129#49541
But I do not understand why FAAH would not have a checkpoint set well above 0%.
monkey
Lawrence
[Jul 11, 2006 1:37:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

I am running FAAH under UD Agent on two dedicated machines 24/7. I usually connect to the Internet 2 or 3 times a day to send completed WUs. Yesterday, one work unit took about 10 hours with only one checkpoint after about 5 hours. If someone is running a slightly slower processor and/or is doing other work on the machine, it is entirely possible that the program would never reach the first checkpoint in an 8-10 hour work day. If these extended checkpoints are really necessary, then the project may lose a very significant amout of processing time.
[Jul 11, 2006 4:26:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

FAAH checkpoints as often as it can.

The problem with checkpointing at any random point is that the program maintains massive, fiendishly complicated data structures and call stacks in memory, and saving or restoring this state is non-trivial and would need huge checkpoint files. Instead, it waits until it reaches the end of a particular part of the computation, where it can save a relatively small file.

Your concern is valid, and it is part of the reasoning behind the minimum requirements for the project. Very occassionally we have to advise someone to leave their computer on for longer than usual while FAAH gets past a tricky spot.
[Jul 11, 2006 4:40:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

Didactylos:

Thanks for the explanation. I understand the reason, but it's the magnitude of the change that is difficult to fathom. I was seeing segment sizes of 1-2% which would correspond to checkpoints of about 12 minutes on my slow machine. Now I'm seeing a segment size of 50% or a checkpoint of about 7 hours on average. BTW, there is a post on another thread of a user who completes 40+% of a work unit and starts over at 0% after a proper exit and restart. Could be the same problem.
[Jul 11, 2006 5:50:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

FAAH work units do vary in size. There are a few absolute whoppers out there.
[Jul 11, 2006 6:53:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Wikkel
Cruncher
Joined: Dec 28, 2005
Post Count: 9
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Time between saves or checkpoints

Ok, thank you all for the additional info.
2 out of 3 devices seem to be past the 50% barrier after 4 hours of crunching.
So, it seems I was a bit impatient.....
[Jul 11, 2006 7:46:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 23   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread