| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 107
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear gb077492,
I think running more than one CEP2 instance on your single core P4 with 1GB RAM is just stretching the hardware limits. CEP2 already uses up most of the physical CPU power, so there is not much room for HT, and each wu uses up 512MB of mem, so you are likely already swapping... But you should be perfectly fine running one instance at a time. Best, your Harvard CEP team |
||
|
|
Dataman
Ace Cruncher Joined: Nov 16, 2004 Post Count: 4865 Status: Offline Project Badges:
|
There has been some misunderstanding on the soft timeout, there is no soft timeout in the current release. At one point there was disucssion of having such a feature but this was not implemented. Sorry for the misunderstanding. We have started discussing adding some sort of a soft timeout again. Stay tuned. Thanks, armstrdj Thanks for the information. ![]() |
||
|
|
gb077492
Advanced Cruncher Joined: Dec 24, 2004 Post Count: 96 Status: Offline |
I think running more than one CEP2 instance on your single core P4 with 1GB RAM is just stretching the hardware limits. I can understand why you would say that, but I have a 10K rpm disk drive and the VM size of each process seems to stay well below 500K. My wife can use Seamonkey to do e-mail and access the web quite happily, with only the odd stutter at WU start-up time, even with 2 CEP2 tasks running. If it wasn't running without issue I wouldn't be running it. Mike. |
||
|
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges:
|
What I would suggest is that the soft time-out be fixed, but that it should also depend on getting through enough of the steps for the result to be useful on its own. From what I read I think that means not until step 8 is complete. Personally I would be happy to extend to the hard time-out to 15 hours or even more. I have observed some WUs where step 2 has run really long, even beyond the 10 hour mark, and it seems silly to kill a WU that soon. Mike I agree with the above, and hope it's doable. I have two faster machines that always finish all 16 jobs (unless they hit the dreaded RC 100), and two slower machines that are always stopped at 12 hrs. I would love for the slower machines to always be given enough time to return a useful result, even if that took much longer than 12 hrs. Kate ![]() |
||
|
|
nasher
Veteran Cruncher USA Joined: Dec 2, 2005 Post Count: 1423 Status: Offline Project Badges:
|
personaly i wouldnt mind results running longer than 12 hours... to be honest i think a 1 day time out would be better for me but i also understand if you think the extra results wont be worth the extra time.
----------------------------------------i am curious though ... 1) if both runs are on slower computers and they dont get though step 12 or so is it still good info 2) if it isnt any larger result file is there any real reason to end at 12 hours ? i am sure i can think of more but i got to go to work now ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi everybody,
thanks for the feedback on the soft-timeout and the timeout-extension as a user option. We had a discussion with our friends at IBM yesterday and these topics are on the near-future agenda .Dear nasher, 1) Yes, it is really fine to have incomplete results. We have about 10 million candidates in mind right now and not all of them will be subject to all calcs anyways. There will be holes in the big result table - no matter what. The most promising ones (which we can identify from a few jobs already) will be finished up in-house, if necessary. 2) Well, the reason really is that many average users don't like to crunch on a workunit for too long, and that is understandable. We know that the enthusiasts don't mind that and prefer to run longer - that's why we try to get it to work as a user option. Best wishes from Your Harvard CEP team P.S. @Mike: Sorry, in that case I really don't know what's going on. Sweet hd by the way! |
||
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
10 million candidates = 10 million unique WU (right?) = 20 million WU that we crunch.
----------------------------------------WU already crunched = 1,751,532 ave WU crunched per day = 8,757 WUs crunched yesterday = 14,308 Assuming that we will average what we did yesterday, the following will tell us how long this project will last. (20 million WU - 1,751,532 )/14,308 = 1275 days (or 3.5 yrs) I now see why they are trying to increase the participation rate. If they have the soft-timeout, that should increase the number of WUs done per day. However, if they have a time-out extension then it will decrease the number of WUs done per day. Basically what I am saying is that we need to increase the number of crunchers for this project so we can get this done faster. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
2) Well, the reason really is that many average users don't like to crunch on a workunit for too long, and that is understandable. We know that the enthusiasts don't mind that and prefer to run longer - that's why we try to get it to work as a user option As an enthusiast, what I want is just to crunch the work and return the results. Unlike some, I really do not want to participate in the software engineering as to whether I want 8, 12 or 16 hour WUs. If the scientists want to cutoff at 12 hours then fine. If you remember, HCC ran for many years, and without hardly a question from the crunchers, the scientists changed the software and reduced the average WU time. Seems they didn't need our advice. Huh! But thanks for asking if I cared. ![]() Shorter increases the number of WU, and may increase the overall data volume. Longer increases the volume of repair WU, which take longer to crunch and the risk to the slower (and off-line) machines that may not be able to finish in time. This whole extended discussion on WU length (CEP2, HCMD2) is divisive, between the have and have-nots (as measured by machine Gflops). Its fine to cater to the fast machines, but do not abandon the user base. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear anhhai,
yes, we'll not run out of work prematurely ;). The good news is that in the course of the projects we learn more and more about the candidates which will help us to focus and prioritize our search. So we probably will not do all of the molecules we have on the book right now, but we'll likely add others as we go along. The scope of CEP2 really is in flow and responds to what we learn. Dear astrolab, it's all perfectly fine - people who don't want to be bothered with details can stick to the 12h default, others will hopefully have an option to extend it it as they like. Repair wus are actually not an issue because the validation is performed early in the workunit, so it is no problem if one wingman gets further than the other. Thanks for crunching and best wishes from your Harvard CEP team |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
BTW, just to add to this post, I'm currently running a HFCC job which looks as though it'll way surpass any lengthy CEP2 job I've had - as it's currently at 60% after 12 Hrs, with another estimated 7 1/2 hrs to go... Normally, on my machine, HFCC jobs average out at 7.38 Hrs (so this is certainly going to have an severe adverse affect on my DCF rating). Although this is the exception, it's by no means the only exception I've had...
----------------------------------------At least if the CEP2 limit was 'upped', we'd know what the limit would be... Edit: Well, my long HFCC WU took 19:36:50 hrs in CPU time (20:38:44 hrs elapsed), and as expected, threw my computers DCF rating completely off Oh well, it'll correct itself again in time...![]() [Edit 1 times, last edit by gb009761 at Jan 16, 2011 1:40:59 AM] |
||
|
|
|