| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 107
|
|
| Author |
|
|
GIBA
Ace Cruncher Joined: Apr 25, 2005 Post Count: 5374 Status: Offline |
I personally prefer run WU's which could be finished in less time than 12 hours. To be frank, by experience in WCG I think that something around 5 or 6 hours appear be the best limit for any kind of project, once reduce any kind of risk of loss, if the machine sofer anything during the long crunch time of 12 hours or more (for example).
----------------------------------------During razoable time in WCG we crunched NRW (Rice) project with 12 hours of duration limit per WU, and in the last months of the project it was changed for 7 hours time limit (I guess... once in fact, I can't remember exactly the time limits from begining and at end of project). I saw that when NRW project reduced the time limit to 7 hours, the project conclusion speed up significantly and was finished in antecipation of the expected date in months (I think that many crunchers will remember it...), and I bet that was due a lot of people starting crunch it just due the reduction of time limit from 12 hours to 7 hours, provinding a more reliable, feasible, and razoable opportunity to participate in, mainly with old machines which could fail during a long time crunching cycle and lost all job... but it is my humble opinion !
Cheers ! GIB@
![]() Join BRASIL - BRAZIL@GRID team and be very happy ! http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1 |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear GIBA,
we understand that there are different preferences amongst users and we'll unfortunately not be able to make everyone happy. Just to give you our perspective: Quantum chemical calcs are usually conducted on high-performance clusters and often run for days on this much more powerful hardware (compared to the average PC). This may give you an impression on the challenge our team faces in designing calcs that are both meaningful and at the same time doable on the WCG with the given time and hardware constraints. We feel that the scientific relevance of this project would be severely limited if we were to go below 12h runtime. But we appreciate your comment! Best wishes from your Harvard CEP team |
||
|
|
Jack007
Master Cruncher CANADA Joined: Feb 25, 2005 Post Count: 1604 Status: Offline Project Badges:
|
So...
----------------------------------------Since there are calculations that run on more powerful machines, with all the 4 and 6 core (HT to 8 and 12) machines out there, I wonder if making a WU designed to use multiple cores to run one, instead of one WU per core. First and biggest objection, I know, is different wu types make more work compiling different programs... makes it a non starter. Just musing if the next project or some new project might do better by requiring say 4 cores min but using up to, well I won't hazard a guess at a number things are changing so fast... I just know that when I go back to work, I'm saving up for either a four socket machine (min 32 cores) or a 2 socket (24 to 32 cores). I used to do a prime number search that would take a month (some years ago) so I fear no large work unit, but primarily I look forward to harnassing mulitple cores, or even multiple machines linked on my home network! Hyper threaded i have 21 cores atm, (and 24 GIGS or RAM) imagine being able to download ONE work unit designed to use the maximum memory and cores accross all platforms. Ok, prob have to use all windows/linuz/mac to work on same work unit, and prob all have to be 32 or 64 bit... Ok Maybe I'll stick to ONE machine's cores as the next logical step in distributed computing... It's fun to imagine! !![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear Jack007,
most of your suggestions are actually no problem in terms of the wus or Q-Chem. But customizations in DC are very hard! It just doesn't lend itself to squeezing out the last bit of performance for a million different computers. I think I said this before - simplicity is a quality in itself! But thanks for your comments anyways. There are certainly still a number of doable improvements and we are always keen on learning what the users have in mind on this front. Best wishes from the Harvard CEP team |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I don't know if this has been posted or not but.
If the way the data is used is, step 1 is done, that result feeds into step 2, which completes, which feeds into step 3 etc etc... Why can't we take those wu's that maybe only have completed say 8 steps, and then send that out again to another machine to crunch the rest from 9 to whatever? In essence we have our 12 hour time limit, and we do what we can with that time, but, now, what was not done, the remainder from the last result is sent out again to 'finish off' the task for the full data set. This would not only get you a more complete data set, but could also create the shorter running tasks some folks are wanting because there are fewer steps they need to do. Just a suggestion. Aaron |
||
|
|
nasher
Veteran Cruncher USA Joined: Dec 2, 2005 Post Count: 1423 Status: Offline Project Badges:
|
I don't know if this has been posted or not but. If the way the data is used is, step 1 is done, that result feeds into step 2, which completes, which feeds into step 3 etc etc... Why can't we take those wu's that maybe only have completed say 8 steps, and then send that out again to another machine to crunch the rest from 9 to whatever? In essence we have our 12 hour time limit, and we do what we can with that time, but, now, what was not done, the remainder from the last result is sent out again to 'finish off' the task for the full data set. This would not only get you a more complete data set, but could also create the shorter running tasks some folks are wanting because there are fewer steps they need to do. from what i understand they normally get one computer (at least) out of the 2 completing all the way through and the partial is used to verify the results. if both end early they will still have enough data to determine if they want to do more research on that WU.. in that case they run it in house instead of creating a specific special WU and sending it to us. as for chopping the WU's to 2 blocks instead of one... I do not agree again the high end computers normally get the work done within 8 hours and the slower ones end at 12 hours. not to mention if they did multiple stages (like DDDT2) they would probably run into the same problem lots of dry spells while they wait to be ready to release the next set of results. as it is right now they can run lots of results and if there is a gap in there data they can run that gap in house with dddt2 they end up havering to wait till a large set of work is done and then they can make the next sets. Personally i like how this project works and i am glad to know that even if one of my machines doesn't complete a job that the results are very useful and there wont be a data gap cause someone else probably finished it ... Another reason not to do it in multiple jobs is size. For instance if the results from phase 7 needed for phase 8 are larger than the result needed to be sent back there is a lot larger data files. its kinda like if you have a math test they give you the question and you need to give them the answer you get (may be a single # or a small formula) so the result they need is small.. if they do a stop send back make next WU and put it out they have an question and lots of work steps listed to be sent to there computer as well as sent out again to the next ![]() |
||
|
|
littlepeaks
Veteran Cruncher USA Joined: Apr 28, 2007 Post Count: 748 Status: Offline Project Badges:
|
I am still running my old Pentium 4 (sigh) / Windows XP and only doing CEP2 right now. Personally, I would like to see the WUs run to completion -- when I run FAAH, many of those WUs are very long. So, don't see what the difference is. My PC usually runs out of time in job 14, and I think I lose the time that it has crunched on the last job that didn't finish. So that much is wasted on my PC (and others in the same boat). I vote -- go for longer WUs.
|
||
|
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges:
|
Huh?!?!? I just got credit for 21 hours CPU time on a CEP2 WU. ????
----------------------------------------E201034_ 876_ A.29.C21H12N6S2.284.2.set1d06_ 2-- kate-jetway64 Valid 1/28/11 13:10:22 1/29/11 16:58:54 21.03 102.3 / 63.8 This WU had problems -- just as job #2 was about to finish, BOINC tried to contact the Internet when my connection was down, which bounced it back to the start of job #2. Then something else happened much later. Here's the end of the log. [09:13:29] Qink name = drvman [09:19:52] Qink name = optman [09:19:52] Qink name = anlman [09:24:46] End of Job [09:24:47] Finished Job #2 [09:24:47] Starting job 3,CPU time has been restored to 40500.690000. Killing job because cpu time limit has been exceeded. 40500.690000||35220.580000||0.000000 [ERROR] Failed to open either source or destination files while copying A.29.C21H12N6S2.284.2.bp86.svp.n.bp86.svp.n.sp/stdout.txt to A.29.C21H12N6S2.284.2.bp86.svp.n.bp86.svp.n.sp.out. Error: 2 [09:24:47] Finished Job #3 09:24:50 (14171): called boinc_finish </stderr_txt> ]]> This is a very slow machine (D525 Atom). I don't have it run much CEP2, but when it does, it's always stopped at 12 hrs, usually around job #8. I think thus WU could have been in progress for 21 hrs. Obviously I was not watching it -- I knew about the Internet connection interruption because I had to get the connection re-established when I came home. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This doesn't really look like a valid result. CEP2 WUs all contain 16 jobs and this WU ended in job 3. I had said earlier that the cut off should be closer to 16 hours as my experience has shown my laptop gets to job 14 or 15 before ending due to excess time being used. I know that it was said the later steps were not really required but are used to verify the results of previous jobs (did I get that right?). If this is the case, why bother running the later jobs? If those jobs do have meaning, I would like a longer maximum execution to allow completion of those jobs. Perhaps an option could be provided (such as the Unlimited specification for Project Specific Settings.
|
||
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
dkt, kateiacy's looks like a valid job (except for it took 21 hrs). We only see up to job #3 because the time limit finally expired. Also job 14 & 15 are useful, they are not to validate other results, they are more information.
----------------------------------------here is an analogy to help you understand. Say you are looking for a date. the 1st 3 jobs (jobs 0-2) gives you the most important information about a the candidate Job 0 tells you the person's gender. (easy to get) Job 1 tells you the person's approximate age. (easy to get) Job 2 tells you if the person's single (takes more effort, need to actually talk to the person, which sometimes is a LOT more effort) the next 9 jobs (job 3-11) gives you some more useful information, and are easy to get once you did job 0-2 Job 3 tells you their name Job 4 tells you their hobby Job 5 tells you their job Job 6 tells you their favorite movie Job 7 tells you their favorite food Job 8 tells you they like to do in their free time Job 9 tells you their favorite color Job 10 tells you their likes Job 11 tells you their dislikes the next 4 jobs (job 12-15) give you more detailed information, but are much harder to get Job 12 tells you their phone number Job 13 tells you their address Job 14 tells you their birth date Job 15 tells you their ring size so you see, all of the jobs reveal important information, but if you are looking to date a single female around the age of 20, and the candidate (WU) you are looking at is male and 44, then you don't need his phone number. CEP2 send out all of these WU looking to match a certain profile. The first 3 jobs tell them what if there is a match or not. If a WU is a match for the profile and not all jobs are finished, then the harvard team will re-crunch them in-house edit : I am wondering if it took 21 hours because job 2 took so long. If you look closely, once it got to job 3, it quit right away. So it might be you have to do jobs 0-2 before you can quit (even if it takes longer then 12 hrs) ![]() [Edit 1 times, last edit by anhhai at Jan 30, 2011 7:08:12 AM] |
||
|
|
|