Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 22
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3467 times and has 21 replies Next Thread
Zigfried
Senior Cruncher
Brazil
Joined: Dec 12, 2005
Post Count: 368
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: problem with computing

Hi, i was crushing 4 WU for CEP2 and i realized that my WU loop back.

They are between 93 - 99% and when one of them finished, all the others jump back to 15%.

26/09/2012 19:18:46|World Community Grid|Message from server: No tasks sent
26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for The Clean Energy Project - Phase 2
26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2
26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2 (Type A)
26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for the applications you have selected.
26/09/2012 23:49:22|World Community Grid|Task E209671_772_C.31.C25H14N4S2.01195678.4.set1d06_2 exited with zero status but no 'finished' file
26/09/2012 23:49:22|World Community Grid|If this happens repeatedly you may need to reset the project.
26/09/2012 23:49:22|World Community Grid|Computation for task E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0 finished
26/09/2012 23:49:23|World Community Grid|Restarting task E209671_772_C.31.C25H14N4S2.01195678.4.set1d06_2 using cep2 version 640
26/09/2012 23:49:24|World Community Grid|Task E209652_905_C.32.C23H11N7S2.02188025.3.set1d06_1 exited with zero status but no 'finished' file
26/09/2012 23:49:24|World Community Grid|If this happens repeatedly you may need to reset the project.
26/09/2012 23:49:24|World Community Grid|Task E209610_208_C.31.C25H12N2S4.02237335.2.set1d06_2 exited with zero status but no 'finished' file
26/09/2012 23:49:24|World Community Grid|If this happens repeatedly you may need to reset the project.
26/09/2012 23:49:25|World Community Grid|Restarting task E209652_905_C.32.C23H11N7S2.02188025.3.set1d06_1 using cep2 version 640
26/09/2012 23:49:25|World Community Grid|Restarting task E209610_208_C.31.C25H12N2S4.02237335.2.set1d06_2 using cep2 version 640
26/09/2012 23:49:27|World Community Grid|Sending scheduler request: To fetch work.
26/09/2012 23:49:27|World Community Grid|Requesting new tasks for CPU
26/09/2012 23:49:29|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_0
26/09/2012 23:49:29|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_1
26/09/2012 23:49:39|World Community Grid|Finished upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_0
26/09/2012 23:49:39|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_2
26/09/2012 23:49:40|World Community Grid|Scheduler request completed: got 0 new tasks
26/09/2012 23:49:40|World Community Grid|Message from server: No tasks sent
26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for The Clean Energy Project - Phase 2
26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2
26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2 (Type A)
26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for the applications you have selected.
26/09/2012 23:49:48|World Community Grid|Finished upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_2
26/09/2012 23:49:48|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_3
26/09/2012 23:49:51|World Community Grid|Finished upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_3
26/09/2012 23:49:51|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_4
26/09/2012 23:49:51|World Community Grid|Sending scheduler request: To fetch work.

When i check the log this particular msg call my attention:
26/09/2012 23:49:24|World Community Grid|Task E209610_208_C.31.C25H12N2S4.02237335.2.set1d06_2 exited with zero status but no 'finished' file
26/09/2012 23:49:24|World Community Grid|If this happens repeatedly you may need to reset the project.

Thank you
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Zigfried at Sep 27, 2012 3:32:53 AM]
[Sep 27, 2012 3:30:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: problem with computing

Your system was too busy is my interpretation... reduce number of concurrent CEP2, or as what I did on Linux, set the system pause at e.g. 35% [default 25%]. Any time the system goes in overload, BOINC pauses for 10 seconds [LAIM on of course]. Since then 100% success rate. Other solution, if you were using the system at the time with something heavier, set <exclusive_app>xxxx</exclusive_app> in the cc_config.xml. I've done so for the system update programs such as apt-get and synaptic. The 5 minute BOINC pause does not weigh up against the many hours lost on crunching progress.
[Sep 27, 2012 7:52:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Zigfried
Senior Cruncher
Brazil
Joined: Dec 12, 2005
Post Count: 368
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: problem with computing

Thank you Sek.

I will check those options.

:)
----------------------------------------

[Sep 27, 2012 11:34:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: problem with computing

Hi, i was crushing 4 WU for CEP2 and i realized that my WU loop back.


Due to the long check points, i created a profile for my home laptop, I selectect to run 2 CEP2 WU at the same time. After this, I started to have the same behaviour:
Right now, I have two WU with 7.48hrs (60%) and 8.09hrs (63%), and both have a check point near the 38 mins.
Yesterday,the progress of a different unit was more than 90%, and the check point was 38 mins.

I have modified the home profile to 1 WU at a time. I will tell you if this fixed the problem.
[Sep 30, 2012 7:37:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: problem with computing

Hi carlosomarcp,
thanks for sharing - yes, there are different ways to tweak and optimize the performance of WCG and CEP2 in particular.
Best wishes
Your Harvard CEP team
[Oct 2, 2012 5:26:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: problem with computing

I've adjusted my processing of CEP2 WUs, and it seems to pretty well during the work week. Basically I know that when at work, I can run a good 9 hours solid M-F, sometimes later if I have an extra long day. For many of these CEP2 WUs, I find that if I let one run for around 3 hours give or take during one day, then suspend it, it seems to hang around a checkpoint level pretty close to the same starting point the next time I bring up my laptop for another weekday. Then on the second weekday, the CEP2 WU is starting right before the long step, and if I run it all day the next day it generally will run to completion. Now this doesn't work for all CEP2 WUs - sometimes I'll get one that seems to want to restart at the begining over and over. If I get one of those, on the second fallback to start, I just abort it and let a bigger cruncher get it on the rebound. This past week I managed to get 4-5 CEP WUs through successfully and only had one abort for the week, so at least I'm getting a decent set of work done.
[Oct 5, 2012 7:48:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: problem with computing

It seems that some of the WUs has that problem. (the checkpoint problem)
I change the settings to receive 8 WU's at the same time. At any time, at least one of them only has a "checkpiont" arount the 38 minutes. Even when it has 98% processed. Sometimes, there are 6 out of 8 with this problem.
They DO finish. But you can not restart the machine...

Regards.
[Oct 16, 2012 2:32:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
rbotterb
Senior Cruncher
United States
Joined: Jul 21, 2005
Post Count: 401
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: problem with computing

I've been seeing more and more CEO2 WUs with this checkpoint issue where it runs almost to completion, but then on a restart it is back to 30 minutes give or take and all the rest of the crunching is lost. For now I've dropped CEP2 from my crunching list and instead are now working on other projects. I'll probably only bring down some CEP2 WUs when I know I'll have a long day at the office where I'll have my laptop up 12+ hours and can start the CEP2 WUs when I come up in the morning and know that they will crunch all the way to completion before I leave for the evening. It just take a lot longer to get to that Gold Badge now.... wink
[Oct 19, 2012 1:20:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: problem with computing

Dear rbotterb and carlosomarcp,
we have discussed the CEP2 checkpointing issue in great detail on several threads in this forum. You can also find the bottomline in our "Tips and Custom Settings" document linked in the footer.
Best wishes from
Your Harvard CEP team
[Oct 20, 2012 2:48:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: problem with computing

Hi cleanenergy,

I have seen several post of this problem, but until now, I haven't seen any where the problem is that a WU last checkpoint is aorund the min 30 (in my machine). I have read and applied the tips.
The problem here is that you have to wait all the 12hrs in most of the WU. I crunch 8 WU's at the same time. Some times, 7 of the 8 WU has only one check point. Actually, it is hard to find more than 4 WU that checks more than once (I mean, 4 out of 8).
I decided to crunch CEP2 WU's only in my home laptop, since I have to shutdown my work laptop from time to time.

I believe that this is migth be an important problem. Since most of the people that joined this project crunch only one WU at a time, it could be hard for them to provide evidence, statistics, or more detail. This might be happening to a lot of people, but when they get to the forum (if the ever do) they could take as a good answer that the second or third checkpoint (donn“t remember) is really long, and that's why the WU almos resets.

If this is a common problem, crunchers might get off the project. I stayed because I really like it, and because I have a machine that can keep turned on all day.

Anyway, I'm probabbly wrong, and this is an known issue. But what I'm trying to say is: Please, do not dismiss this issue with out checking that this is something you already know.
[Oct 21, 2012 1:57:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread