Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 12
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1526 times and has 11 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
sad AC@H workunit keeps restarting at 0% after reaching 50%

Hi guys! I need some help/advice... my system has been attempting to crunch an AC@H job, and it seems to be having some sort of problem getting past 50%. It started crunching yesterday evening, and after 12 hours it was just at about 50% done, and it suddenly decided to restart from 0%.

Since my system had not been rebooted for a few days, I went ahead and rebooted it to make sure it was nice and fresh. Other than occasionally checking my email (or checking the forums), I pretty much just left the system alone all day to crunch while I've been puttering around elsewhere.

After running for 12 hour it again reached about 50% and BAM! it just restarted again at 0% for no darn reason that I can fathom.

Here is an excerpt from all the logfiles. Pretty much looks the same all day long, with a periodic checkpoints and occasional attempts to download additional AC@H workunits (which were apparently not available):

[...]
3/10/2008 3:25:57 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 3:46:14 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 4:06:38 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 4:24:39 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 4:51:59 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 5:16:59 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 5:35:32 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 5:45:36 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 233857 seconds of work, reporting 0 completed tasks
3/10/2008 5:45:41 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:45:41 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:45:41 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:45:41 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 5:46:47 PM|World Community Grid|Fetching scheduler list
3/10/2008 5:46:52 PM|World Community Grid|Master file download succeeded
3/10/2008 5:46:57 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 234230 seconds of work, reporting 0 completed tasks
3/10/2008 5:47:02 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:47:02 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:47:02 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:47:02 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 5:48:07 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 234605 seconds of work, reporting 0 completed tasks
3/10/2008 5:48:12 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:48:12 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:48:12 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:48:12 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 5:49:18 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 234848 seconds of work, reporting 0 completed tasks
3/10/2008 5:49:23 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:49:23 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:49:23 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:49:23 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 5:50:29 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 235149 seconds of work, reporting 0 completed tasks
3/10/2008 5:50:34 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:50:34 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:50:34 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:50:34 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 5:51:40 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 235465 seconds of work, reporting 0 completed tasks
3/10/2008 5:51:45 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:51:45 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:51:45 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:51:45 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 5:52:55 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 235836 seconds of work, reporting 0 completed tasks
3/10/2008 5:53:00 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:53:00 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:53:00 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:53:00 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 5:56:46 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 236786 seconds of work, reporting 0 completed tasks
3/10/2008 5:56:51 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 5:56:51 PM|World Community Grid|Message from server: No work sent
3/10/2008 5:56:51 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 5:56:51 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 6:01:10 PM|World Community Grid|Restarting task ach1_24_29_3 using acah version 514


I converted over from running UD to BOINC just a few weeks ago, so I'm still fumbling around learning BOINC's quirks and how it interacts on my system (an IBM T42 laptop, running XP). This was also the first AC@H workunit that I've managed to snag thus far. I have been able to successfully complete both DDDT and HCC workunits (all those were returned with "valid" status).

So my questions basically are...
1) Is this "normal" to see workunits suddenly restart like this (for either AC@H or BOINC in general)?
2) Should I continue to try to crunch this workunit or just give up?

Any suggestions or ideas on why this workunit won't finish for me would be greatly appreciated (and would also help to make me not regret moving from UD to BOINC too cool )

thanks,
-Gavi
[Mar 11, 2008 2:21:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

Please will you post the stderr.txt file in the slots\0\ directory?
[Mar 11, 2008 2:32:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

Please will you post the stderr.txt file in the slots\0\ directory?

Sure! Here you go:

Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
Restarting WRF
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::12:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
Restarting WRF
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/9::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::0:0:0 1
[Mar 11, 2008 2:44:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
retsof
Former Community Advisor
USA
Joined: Jul 31, 2005
Post Count: 6824
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

techs: Compare other AC@H errors here

http://www.worldcommunitygrid.org/forums/wcg/...ad=18498&lastpage=yes

but this one seems to be nastier. I only had one marked "invalid" and none of these symptoms.
----------------------------------------
SUPPORT ADVISOR
Work+GPU i7 8700 12threads
School i7 4770 8threads
Default+GPU Ryzen 7 3700X 16threads
Ryzen 7 3800X 16 threads
Ryzen 9 3900X 24threads
Home i7 3540M 4threads50%
----------------------------------------
[Edit 1 times, last edit by retsof at Mar 11, 2008 3:31:34 AM]
[Mar 11, 2008 3:30:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

Good morning/afternoon/evening all. Just thought I'd toss out more data to you, just in case its at all helpful to anyone.

When I went to sleep last night, it was chugging along around 30% done. I vaguely remembering waking up briefly around 2 AM and glancing at the progress to see it was back down to just around 5%, indicating that it had reset back to 0% again. I rolled over and fell back to sleep hoping it was just a bad dream. crying

Now that I've slumbered properly, I'm looking at the status and messages, and it appears that it did indeed reset and its now worked itself back to 28% progress. Looking at the messages logs, I don't see an actual message of "restarting" this time, but based on the sizes referenced in the "work fetch request" messages, something happened between 12:25 AM and 1:28 AM, as the requested size suddenly dropped to a small number after progressively growing hour by hour:

[...]
3/10/2008 11:01:13 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 11:13:48 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 170359 seconds of work, reporting 0 completed tasks
3/10/2008 11:13:53 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:13:53 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:13:53 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:13:53 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:14:59 PM|World Community Grid|Fetching scheduler list
3/10/2008 11:15:04 PM|World Community Grid|Master file download succeeded
3/10/2008 11:15:09 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 171005 seconds of work, reporting 0 completed tasks
3/10/2008 11:15:14 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:15:14 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:15:14 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:15:14 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:16:19 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 171650 seconds of work, reporting 0 completed tasks
3/10/2008 11:16:24 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:16:24 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:16:24 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:16:24 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:17:28 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 172148 seconds of work, reporting 0 completed tasks
3/10/2008 11:17:33 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:17:33 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:17:33 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:17:33 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:18:38 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 172804 seconds of work, reporting 0 completed tasks
3/10/2008 11:18:43 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:18:43 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:18:43 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:18:43 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:19:52 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 173275 seconds of work, reporting 0 completed tasks
3/10/2008 11:20:02 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:20:02 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:20:02 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:20:02 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:21:12 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 11:21:45 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 173035 seconds of work, reporting 0 completed tasks
3/10/2008 11:21:50 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:21:50 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:21:50 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:21:50 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:25:21 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 175056 seconds of work, reporting 0 completed tasks
3/10/2008 11:25:26 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:25:26 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:25:26 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:25:26 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/10/2008 11:40:26 PM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/10/2008 11:43:34 PM|World Community Grid|Sending scheduler request: To fetch work. Requesting 181331 seconds of work, reporting 0 completed tasks
3/10/2008 11:43:39 PM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/10/2008 11:43:39 PM|World Community Grid|Message from server: No work sent
3/10/2008 11:43:39 PM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/10/2008 11:43:39 PM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/11/2008 12:11:57 AM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/11/2008 12:25:44 AM|World Community Grid|Sending scheduler request: To fetch work. Requesting 191820 seconds of work, reporting 0 completed tasks
3/11/2008 12:25:49 AM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/11/2008 12:25:49 AM|World Community Grid|Message from server: No work sent
3/11/2008 12:25:49 AM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/11/2008 12:25:49 AM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/11/2008 12:36:26 AM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/11/2008 1:28:07 AM|World Community Grid|Sending scheduler request: To fetch work. Requesting 17548 seconds of work, reporting 0 completed tasks
3/11/2008 1:28:18 AM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/11/2008 1:28:18 AM|World Community Grid|Message from server: No work sent
3/11/2008 1:28:18 AM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/11/2008 1:28:18 AM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.
3/11/2008 1:28:58 AM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/11/2008 1:50:54 AM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/11/2008 2:13:19 AM|World Community Grid|[checkpoint_debug] result ach1_24_29_3 checkpointed
3/11/2008 2:28:29 AM|World Community Grid|Sending scheduler request: To fetch work. Requesting 45073 seconds of work, reporting 0 completed tasks
3/11/2008 2:28:34 AM|World Community Grid|Scheduler request succeeded: got 0 new tasks
3/11/2008 2:28:34 AM|World Community Grid|Message from server: No work sent
3/11/2008 2:28:34 AM|World Community Grid|Message from server: No work is available for AfricanClimate@Home
3/11/2008 2:28:34 AM|World Community Grid|Message from server: No work available for the applications you have selected. Please check your settings on the website.

Here is the stderr.txt file also, in case thats of any use:

Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
Restarting WRF
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::12:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
Restarting WRF
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/9::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
INFO: No state to restore from. Starting from beginning
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/5::0:0:0 1
Failed to get VersionInfo size: 1812
World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version
Restarting WRF
Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/11/8::6:0:0 1


If my continuing to allow this workunit to crunch-n-restart can help any of you support folks learn something that will help the project at large, I'll keep letting it run, despite being depressed that I've not returned any "real" work now for the last two days. Please let me know if there is any other data you want/need from me.

thanks again,
-Gavi
[Mar 11, 2008 3:22:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

I wonder if you're hitting a memory or disk size constraint? No messages, but check this anyhow!

The request for work time reduction is a result of your machine making no progess so BOINC 'learns' that it needs to ask for less. I'm sure when you get thru this, your client will be a little confused.

Anyway, if it's not memory constraints, and going through X times 50%, suggest to abort the job and reset WCG in the Project tab, so BOINC forgets it was being astray. Let someone else have a go at 24_29_3.

ttyl
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Mar 11, 2008 3:40:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

Memory and diskspace both have plenty of elbow room. I even did some housekeeping to free up an additional half gig of space, just in case. Darn thing reset to 0% again this morning.

I guess the Universe does not want me to earn a happy-sun badge at this time, and wants me to work on getting a blood-sucking mosquito instead. I'll do what you suggested and hit the proverbial "i give up" button and release this work unit so someone else can hopefully have better luck at getting it processed for the scientists.

As far as my laptop being a little confused after this... well, thats no biggie. It will just be more like its operator! biggrin

Thanks to you all for your help. Even if you couldn't "fix" it, I appreciate the ever-friendly support and efforts rose

-Gavi
[Mar 11, 2008 6:44:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

Yes, it's time to abort it.

According to the log, your suspicions are correct - it is failing to restart from a checkpoint. Since you had checkpoint_debug on, we know it actually was checkpointing.

So, there's nothing we can do. Sorry for the wasted time.
[Mar 11, 2008 7:38:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

Yes, it's time to abort it.

According to the log, your suspicions are correct - it is failing to restart from a checkpoint. Since you had checkpoint_debug on, we know it actually was checkpointing.

So, there's nothing we can do. Sorry for the wasted time.


Yup. Workunit aborted. The results status page shows it was returned with "Error" status and 21.56 hours of CPU time. Oh well... I hope whoever picks it up from the work queue is able to crunch it successfully to 100% completion. I've downloaded some more DDDT jobs to crunch instead.

I fortunately just figured out how to turn on the checkpoint_debug about a week ago, so at least that was somewhat helpful in trying to see what this workunit was doing. If there are any other debug swtiches that you'd recommend I enable in case of future issues like this, please let me know.

Thanks again guys. You are still all awesome in my book love struck
-Gavi
[Mar 11, 2008 8:09:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
retsof
Former Community Advisor
USA
Joined: Jul 31, 2005
Post Count: 6824
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: AC@H workunit keeps restarting at 0% after reaching 50%

just an aside---speaking of disk. Today, this AMD X2 dualie happened to be running two AC@H simultaneously. It's got 4 Gb and about 7 Gb swap, and the swap and/or disk writes were VERY busy, compared to just running one. The write and checkpoint disk files are very large compared to other usual projects, and the upload and download are also very large. Page faults ranged from 0 to 7100 per second, but did not seem to hurt anything.
----------------------------------------
SUPPORT ADVISOR
Work+GPU i7 8700 12threads
School i7 4770 8threads
Default+GPU Ryzen 7 3700X 16threads
Ryzen 7 3800X 16 threads
Ryzen 9 3900X 24threads
Home i7 3540M 4threads50%
----------------------------------------
[Edit 2 times, last edit by retsof at Mar 12, 2008 3:14:11 AM]
[Mar 12, 2008 3:13:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread