| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 22
|
|
| Author |
|
|
Zigfried
Senior Cruncher Brazil Joined: Dec 12, 2005 Post Count: 368 Status: Offline Project Badges:
|
Hi, i was crushing 4 WU for CEP2 and i realized that my WU loop back.
----------------------------------------They are between 93 - 99% and when one of them finished, all the others jump back to 15%. 26/09/2012 19:18:46|World Community Grid|Message from server: No tasks sent 26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for The Clean Energy Project - Phase 2 26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2 26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2 (Type A) 26/09/2012 19:18:46|World Community Grid|Message from server: No tasks are available for the applications you have selected. 26/09/2012 23:49:22|World Community Grid|Task E209671_772_C.31.C25H14N4S2.01195678.4.set1d06_2 exited with zero status but no 'finished' file 26/09/2012 23:49:22|World Community Grid|If this happens repeatedly you may need to reset the project. 26/09/2012 23:49:22|World Community Grid|Computation for task E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0 finished 26/09/2012 23:49:23|World Community Grid|Restarting task E209671_772_C.31.C25H14N4S2.01195678.4.set1d06_2 using cep2 version 640 26/09/2012 23:49:24|World Community Grid|Task E209652_905_C.32.C23H11N7S2.02188025.3.set1d06_1 exited with zero status but no 'finished' file 26/09/2012 23:49:24|World Community Grid|If this happens repeatedly you may need to reset the project. 26/09/2012 23:49:24|World Community Grid|Task E209610_208_C.31.C25H12N2S4.02237335.2.set1d06_2 exited with zero status but no 'finished' file 26/09/2012 23:49:24|World Community Grid|If this happens repeatedly you may need to reset the project. 26/09/2012 23:49:25|World Community Grid|Restarting task E209652_905_C.32.C23H11N7S2.02188025.3.set1d06_1 using cep2 version 640 26/09/2012 23:49:25|World Community Grid|Restarting task E209610_208_C.31.C25H12N2S4.02237335.2.set1d06_2 using cep2 version 640 26/09/2012 23:49:27|World Community Grid|Sending scheduler request: To fetch work. 26/09/2012 23:49:27|World Community Grid|Requesting new tasks for CPU 26/09/2012 23:49:29|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_0 26/09/2012 23:49:29|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_1 26/09/2012 23:49:39|World Community Grid|Finished upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_0 26/09/2012 23:49:39|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_2 26/09/2012 23:49:40|World Community Grid|Scheduler request completed: got 0 new tasks 26/09/2012 23:49:40|World Community Grid|Message from server: No tasks sent 26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for The Clean Energy Project - Phase 2 26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2 26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for Discovering Dengue Drugs - Together - Phase 2 (Type A) 26/09/2012 23:49:40|World Community Grid|Message from server: No tasks are available for the applications you have selected. 26/09/2012 23:49:48|World Community Grid|Finished upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_2 26/09/2012 23:49:48|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_3 26/09/2012 23:49:51|World Community Grid|Finished upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_3 26/09/2012 23:49:51|World Community Grid|Started upload of E209667_342_C.29.C26H16SSeSi.01509536.3.set1d06_0_4 26/09/2012 23:49:51|World Community Grid|Sending scheduler request: To fetch work. When i check the log this particular msg call my attention: 26/09/2012 23:49:24|World Community Grid|Task E209610_208_C.31.C25H12N2S4.02237335.2.set1d06_2 exited with zero status but no 'finished' file 26/09/2012 23:49:24|World Community Grid|If this happens repeatedly you may need to reset the project. Thank you ![]() [Edit 1 times, last edit by Zigfried at Sep 27, 2012 3:32:53 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Your system was too busy is my interpretation... reduce number of concurrent CEP2, or as what I did on Linux, set the system pause at e.g. 35% [default 25%]. Any time the system goes in overload, BOINC pauses for 10 seconds [LAIM on of course]. Since then 100% success rate. Other solution, if you were using the system at the time with something heavier, set <exclusive_app>xxxx</exclusive_app> in the cc_config.xml. I've done so for the system update programs such as apt-get and synaptic. The 5 minute BOINC pause does not weigh up against the many hours lost on crunching progress.
|
||
|
|
Zigfried
Senior Cruncher Brazil Joined: Dec 12, 2005 Post Count: 368 Status: Offline Project Badges:
|
Thank you Sek.
----------------------------------------I will check those options. :) ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi, i was crushing 4 WU for CEP2 and i realized that my WU loop back. Due to the long check points, i created a profile for my home laptop, I selectect to run 2 CEP2 WU at the same time. After this, I started to have the same behaviour: Right now, I have two WU with 7.48hrs (60%) and 8.09hrs (63%), and both have a check point near the 38 mins. Yesterday,the progress of a different unit was more than 90%, and the check point was 38 mins. I have modified the home profile to 1 WU at a time. I will tell you if this fixed the problem. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi carlosomarcp,
thanks for sharing - yes, there are different ways to tweak and optimize the performance of WCG and CEP2 in particular. Best wishes Your Harvard CEP team |
||
|
|
rbotterb
Senior Cruncher United States Joined: Jul 21, 2005 Post Count: 401 Status: Offline Project Badges:
|
I've adjusted my processing of CEP2 WUs, and it seems to pretty well during the work week. Basically I know that when at work, I can run a good 9 hours solid M-F, sometimes later if I have an extra long day. For many of these CEP2 WUs, I find that if I let one run for around 3 hours give or take during one day, then suspend it, it seems to hang around a checkpoint level pretty close to the same starting point the next time I bring up my laptop for another weekday. Then on the second weekday, the CEP2 WU is starting right before the long step, and if I run it all day the next day it generally will run to completion. Now this doesn't work for all CEP2 WUs - sometimes I'll get one that seems to want to restart at the begining over and over. If I get one of those, on the second fallback to start, I just abort it and let a bigger cruncher get it on the rebound. This past week I managed to get 4-5 CEP WUs through successfully and only had one abort for the week, so at least I'm getting a decent set of work done.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It seems that some of the WUs has that problem. (the checkpoint problem)
I change the settings to receive 8 WU's at the same time. At any time, at least one of them only has a "checkpiont" arount the 38 minutes. Even when it has 98% processed. Sometimes, there are 6 out of 8 with this problem. They DO finish. But you can not restart the machine... Regards. |
||
|
|
rbotterb
Senior Cruncher United States Joined: Jul 21, 2005 Post Count: 401 Status: Offline Project Badges:
|
I've been seeing more and more CEO2 WUs with this checkpoint issue where it runs almost to completion, but then on a restart it is back to 30 minutes give or take and all the rest of the crunching is lost. For now I've dropped CEP2 from my crunching list and instead are now working on other projects. I'll probably only bring down some CEP2 WUs when I know I'll have a long day at the office where I'll have my laptop up 12+ hours and can start the CEP2 WUs when I come up in the morning and know that they will crunch all the way to completion before I leave for the evening. It just take a lot longer to get to that Gold Badge now....
![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear rbotterb and carlosomarcp,
we have discussed the CEP2 checkpointing issue in great detail on several threads in this forum. You can also find the bottomline in our "Tips and Custom Settings" document linked in the footer. Best wishes from Your Harvard CEP team |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi cleanenergy,
I have seen several post of this problem, but until now, I haven't seen any where the problem is that a WU last checkpoint is aorund the min 30 (in my machine). I have read and applied the tips. The problem here is that you have to wait all the 12hrs in most of the WU. I crunch 8 WU's at the same time. Some times, 7 of the 8 WU has only one check point. Actually, it is hard to find more than 4 WU that checks more than once (I mean, 4 out of 8). I decided to crunch CEP2 WU's only in my home laptop, since I have to shutdown my work laptop from time to time. I believe that this is migth be an important problem. Since most of the people that joined this project crunch only one WU at a time, it could be hard for them to provide evidence, statistics, or more detail. This might be happening to a lot of people, but when they get to the forum (if the ever do) they could take as a good answer that the second or third checkpoint (donn“t remember) is really long, and that's why the WU almos resets. If this is a common problem, crunchers might get off the project. I stayed because I really like it, and because I have a machine that can keep turned on all day. Anyway, I'm probabbly wrong, and this is an known issue. But what I'm trying to say is: Please, do not dismiss this issue with out checking that this is something you already know. |
||
|
|
|