Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1549 times and has 7 replies Next Thread
medi
Cruncher
Joined: Jun 11, 2006
Post Count: 3
Status: Offline
Reply to this Post  Reply with Quote 
Problems with a FAAH WU?

Hi, I think I have a problem whit a WU. in two days the progress are only 4% and sometimes work restart from 0. why? and what can i do?

i hope you can understand me.

if it could help you this is today's log of boinc.

22/08/2006 8.53.42||Starting BOINC client version 5.4.11 for windows_intelx86
22/08/2006 8.53.42||libcurl/7.15.3 OpenSSL/0.9.8a zlib/1.2.3
22/08/2006 8.53.42||Data directory: E:\Programmi\BOINC
22/08/2006 8.53.43||Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2000+
22/08/2006 8.53.43||Memory: 511.48 MB physical, 1.97 GB virtual
22/08/2006 8.53.43||Disk: 6.79 GB total, 3.89 GB free
22/08/2006 8.53.43|rosetta@home|URL: http://boinc.bakerlab.org/rosetta/; Computer ID: 234711; location: ; project prefs: default
22/08/2006 8.53.43|boincsimap|URL: http://boinc.bio.wzw.tum.de/boincsimap/; Computer ID: 27168; location: home; project prefs: default
22/08/2006 8.53.43|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 47234; location: ; project prefs: default
22/08/2006 8.53.44||General prefs: from http://lhcathome.cern.ch/ (last modified 2006-05-22 13:25:45)
22/08/2006 8.53.44||General prefs: no separate prefs for home; using your defaults
22/08/2006 8.53.44||Local control only allowed
22/08/2006 8.53.44||Listening on port 31416
22/08/2006 8.53.44|World Community Grid|Deferring task faah0765_bdb395_mx1gno_0C_2
22/08/2006 8.53.44|rosetta@home|Deferring task FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0
22/08/2006 8.53.44|boincsimap|Resuming task 60801004.004930_2 using simap version 510
22/08/2006 8.53.45|World Community Grid|Restarting task faah0765_bdb395_mx1gno_0C_2 using faah version 510
22/08/2006 8.53.45|boincsimap|Pausing task 60801004.004930_2 (removed from memory)
22/08/2006 9.53.44|World Community Grid|Pausing task faah0765_bdb395_mx1gno_0C_2 (removed from memory)
22/08/2006 9.53.44|rosetta@home|Restarting task FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0 using rosetta version 525
22/08/2006 10.53.45|rosetta@home|Pausing task FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0 (removed from memory)
22/08/2006 10.53.45|boincsimap|Restarting task 60801004.004930_2 using simap version 510
22/08/2006 10.56.00||Rescheduling CPU: application exited
22/08/2006 10.56.00|boincsimap|Computation for task 60801004.004930_2 finished
22/08/2006 10.56.01|boincsimap|Starting task 60801005.001770_0 using simap version 510
22/08/2006 10.56.03|boincsimap|Started upload of file 60801004.004930_2_0
22/08/2006 10.56.30|boincsimap|Finished upload of file 60801004.004930_2_0
22/08/2006 10.56.30|boincsimap|Throughput 26639 bytes/sec
22/08/2006 11.56.02|World Community Grid|Restarting task faah0765_bdb395_mx1gno_0C_2 using faah version 510
22/08/2006 11.56.02|boincsimap|Pausing task 60801005.001770_0 (removed from memory)
22/08/2006 12.56.02|World Community Grid|Pausing task faah0765_bdb395_mx1gno_0C_2 (removed from memory)
22/08/2006 12.56.02|rosetta@home|Restarting task FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0 using rosetta version 525
22/08/2006 13.20.36|boincsimap|Sending scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi
22/08/2006 13.20.36|boincsimap|Reason: To report completed tasks
22/08/2006 13.20.36|boincsimap|Reporting 1 tasks
22/08/2006 13.20.41|boincsimap|Scheduler request succeeded
22/08/2006 13.56.02|rosetta@home|Pausing task FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0 (removed from memory)
22/08/2006 13.56.03|boincsimap|Restarting task 60801005.001770_0 using simap version 510
22/08/2006 14.49.01|boincsimap|Sending scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi
22/08/2006 14.49.01|boincsimap|Reason: To fetch work
22/08/2006 14.49.01|boincsimap|Requesting 16 seconds of new work
22/08/2006 14.49.07|boincsimap|Scheduler request succeeded
22/08/2006 14.49.09|boincsimap|Started download of file 60801005.012121
22/08/2006 14.49.45|boincsimap|Finished download of file 60801005.012121
22/08/2006 14.49.45|boincsimap|Throughput 56780 bytes/sec
22/08/2006 14.49.46||Rescheduling CPU: files downloaded
22/08/2006 14.49.46|World Community Grid|Restarting task faah0765_bdb395_mx1gno_0C_2 using faah version 510
22/08/2006 14.49.46|boincsimap|Pausing task 60801005.001770_0 (removed from memory)
22/08/2006 15.49.46|World Community Grid|Pausing task faah0765_bdb395_mx1gno_0C_2 (removed from memory)
22/08/2006 15.49.46|rosetta@home|Restarting task FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0 using rosetta version 525
22/08/2006 16.38.21|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
22/08/2006 16.38.21|rosetta@home|Reason: To fetch work
22/08/2006 16.38.21|rosetta@home|Requesting 8640 seconds of new work
22/08/2006 16.38.22||Rescheduling CPU: application exited
22/08/2006 16.38.22|rosetta@home|Computation for task FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0 finished
22/08/2006 16.38.22|boincsimap|Restarting task 60801005.001770_0 using simap version 510
22/08/2006 16.38.24|rosetta@home|Started upload of file FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0_0
22/08/2006 16.38.26|rosetta@home|Scheduler request succeeded
22/08/2006 16.38.28|rosetta@home|Finished upload of file FRA_t368_CASPR_hom001_7_t368_7_dec24IGNORE_THE_REST_1_1179_277_0_0
22/08/2006 16.38.28|rosetta@home|Throughput 10814 bytes/sec
22/08/2006 16.38.28|rosetta@home|Started download of file t368_7_dec77_1.pdb.gz
22/08/2006 16.38.28|rosetta@home|Started download of file t368_7_dec77_1.loopfile.gz
22/08/2006 16.38.30|rosetta@home|Finished download of file t368_7_dec77_1.loopfile.gz
22/08/2006 16.38.30|rosetta@home|Throughput 180 bytes/sec
22/08/2006 16.38.31|rosetta@home|Finished download of file t368_7_dec77_1.pdb.gz
22/08/2006 16.38.31|rosetta@home|Throughput 24604 bytes/sec
22/08/2006 16.38.33||Rescheduling CPU: files downloaded
22/08/2006 16.58.05||Rescheduling CPU: application exited
22/08/2006 16.58.05|boincsimap|Computation for task 60801005.001770_0 finished
22/08/2006 16.58.05|rosetta@home|Starting task FRA_t368_CASPR_hom001_7_t368_7_dec77IGNORE_THE_REST_1_1179_710_0 using rosetta version 525
22/08/2006 16.58.07|boincsimap|Started upload of file 60801005.001770_0_0
22/08/2006 16.59.00|boincsimap|Finished upload of file 60801005.001770_0_0
22/08/2006 16.59.00|boincsimap|Throughput 25873 bytes/sec
22/08/2006 17.58.05|rosetta@home|Pausing task FRA_t368_CASPR_hom001_7_t368_7_dec77IGNORE_THE_REST_1_1179_710_0 (removed from memory)
22/08/2006 17.58.05|boincsimap|Starting task 60801005.012121_3 using simap version 510
22/08/2006 18.58.05|World Community Grid|Restarting task faah0765_bdb395_mx1gno_0C_2 using faah version 510
22/08/2006 18.58.05|boincsimap|Pausing task 60801005.012121_3 (removed from memory)
22/08/2006 19.02.32|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
22/08/2006 19.02.32|rosetta@home|Reason: To report completed tasks
22/08/2006 19.02.32|rosetta@home|Reporting 1 tasks
22/08/2006 19.02.38|rosetta@home|Scheduler request succeeded
22/08/2006 19.23.01|boincsimap|Sending scheduler request to http://boinc.bio.wzw.tum.de/boincsimap_cgi/cgi
22/08/2006 19.23.01|boincsimap|Reason: To report completed tasks
22/08/2006 19.23.01|boincsimap|Reporting 1 tasks
22/08/2006 19.23.06|boincsimap|Scheduler request succeeded

[Aug 22, 2006 5:41:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with a FAAH WU?

Medi,

IMHO, on reading your log is, that the FAAH never manages to reach a segment save point for the FAAH. 2 Options:

1. Set in the WCG BOINC profile a switch time to e.g. 12 hours.....it should be able to do 1 or more segments, if not the whole FAAH.
2. Obtain an alpha copy of the 5.5.x BOINC agent if existing for your OS. It considers segment save points (waits for it).

My philosophy is that if u choose to run multiple projects, u can make it balance the time share over a longer period....BOINC will do it for u so at the end of the week, fortnight or whatever, each gets it's share according the weight u gave it

ciao
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Aug 23, 2006 10:11:07 AM]
[Aug 22, 2006 6:46:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
medi
Cruncher
Joined: Jun 11, 2006
Post Count: 3
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with a FAAH WU?

but this is the first time that this problem occurs, with the same settings.

moreover, it seems me that boinc agent before pausing a task to start another project waits for a checkpoint, but i'm not sure.

i don't think it is better working such as an alphatester I can have much problems and losing my time to computing.

well, maybe if the problem today is still present I will try with the first option. i have always completed a FAAH wu in 12hours


thank you
----------------------------------------
[Edit 1 times, last edit by medi at Aug 23, 2006 7:25:58 AM]
[Aug 23, 2006 7:24:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with a FAAH WU?

Before version 5.5.x BOINC did not wait for a checkpoint. However, Sekerob's first option will accomplish the same thing. You also have a 3rd option, just abort the WU, it may be defective.

The techs can take note of the WU name and your computer ID from your log report and investigate further if they wish.
[Aug 23, 2006 9:09:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with a FAAH WU?

BOINC doesn't wait for a checkpoint. As you can see from the log, BOINC is switching religiously every hour. This is far too quick for some of the WCG projects, they checkpoint infrequently.

Set the timeslot to at least 6 hours, and you should be fine.

A future version (now in alpha) takes the timeslot setting more as a guideline, and will run a work unit longer to reach a checkpoint if it can.
[Aug 23, 2006 9:13:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with a FAAH WU?

Medi,

One more possible, call it option 4 (option 3 is printed in red ink ) ......there's a BOINC Profile setting to keep a WU in memory when paused......aside it could keep with the HDC WU 's up to 750mb in RAM/Virtual occupied, the hourly switching might not loose u the time. Other's use(d) it, but on mine the WU gets corrupted if e.g. the periodic benchmark kicks in.

Can't find anywhere on this forum a discussion on BOINC's 'LTD' (Long Term Debt) and 'STD' (Shortest Term Debt) Work Scheduling . This is the accounting that BOINC performs for u over a longer period. If e.g. the 'Weight' of WCG is 25%, Rosetta 50%, SIMAP 25%, at the end of the week, each should have total hours close to 24*7*Weight. If for some reason, any project gets too much, it will stop that project for a while until it is in balance again with its weight. This can happen in situations of WU's that are close to deadline....they get first priority.

There are FAAH's around that have no segments, thus even if interrupted at 99,999%, on restart it would go back to zero. On my machine a FAAH takes 7.5 to 8.5 hours CPU time, which translates somewhat less than 12 hrs in real life, hence that proposed wall-clock switch time of half a day.

In Bocca Al Lupo

Signed,
The Trialist
(Those who Try, Err, Those Trying a lot Err more.... There are Those that never Err)

PS: The full explanation of LTD/STD here: BOINC Work Scheduler
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Aug 23, 2006 11:05:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with a FAAH WU?

medi,

Even the Pope kills a WU occassionally. For Sekerob a WU is a Sacred Cow that must never be killed. Set the work interval to 6 hours. If that does not cure the problem then step on an ant and kill the WU. It is not necessary to pray for forgiveness if you kill only 1.
[Aug 23, 2006 11:49:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
medi
Cruncher
Joined: Jun 11, 2006
Post Count: 3
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with a FAAH WU?

thanks to all.

I know how boinc sheuduling works, and this is the why I opened this topic. I haven't had problems such this and I never aborted o reported a WU after deadline.

option 4 could be better than now but it could get slower performance, using too much swap file, and I can lose all data for shutting down my pc or a blackout.

for the moment i'm going to try to suspend other projects and modify "switch time" to 720minutes (12 hours)to prevent future problems. I hope this can solve the problem (if there is one).

I will give you more info if necessary when, and if, i will know much more than now.

thanks to you all for the help
[Aug 23, 2006 12:58:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread