World Community Grid - View Thread - Is the 12 Hr cut off the best limit?

World Community Grid Forums

Category: Completed Research

Forum: The Clean Energy Project - Phase 2 Forum

Thread: Is the 12 Hr cut off the best limit?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 107

[ ]

Author

This topic has been viewed 1460402 times and has 106 replies

GIBA
Ace Cruncher
Joined: Apr 25, 2005
Post Count: 5374
Status: Offline


Re: Is the 12 Hr cut off the best limit?

I personally prefer run WU's which could be finished in less time than 12 hours. To be frank, by experience in WCG I think that something around 5 or 6 hours appear be the best limit for any kind of project, once reduce any kind of risk of loss, if the machine sofer anything during the long crunch time of 12 hours or more (for example).

During razoable time in WCG we crunched NRW (Rice) project with 12 hours of duration limit per WU, and in the last months of the project it was changed for 7 hours time limit (I guess... once in fact, I can't remember exactly the time limits from begining and at end of project).

I saw that when NRW project reduced the time limit to 7 hours, the project conclusion speed up significantly and was finished in antecipation of the expected date in months (I think that many crunchers will remember it...), and I bet that was due a lot of people starting crunch it just due the reduction of time limit from 12 hours to 7 hours, provinding a more reliable, feasible, and razoable opportunity to participate in, mainly with old machines which could fail during a long time crunching cycle and lost all job... but it is my humble opinion !

----------------------------------------

Cheers ! GIB@
Join BRASIL - BRAZIL@GRID team and be very happy !
http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1

[Jan 16, 2011 12:54:20 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Is the 12 Hr cut off the best limit?

Dear GIBA,
we understand that there are different preferences amongst users and we'll unfortunately not be able to make everyone happy.
Just to give you our perspective: Quantum chemical calcs are usually conducted on high-performance clusters and often run for days on this much more powerful hardware (compared to the average PC). This may give you an impression on the challenge our team faces in designing calcs that are both meaningful and at the same time doable on the WCG with the given time and hardware constraints. We feel that the scientific relevance of this project would be severely limited if we were to go below 12h runtime.
But we appreciate your comment!
Best wishes from your

Harvard CEP team

[Jan 17, 2011 6:09:35 PM]

Jack007
Master Cruncher
CANADA
Joined: Feb 25, 2005
Post Count: 1604
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

5 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Is the 12 Hr cut off the best limit?

So...
Since there are calculations that run on more powerful machines,
with all the 4 and 6 core (HT to 8 and 12) machines out there,
I wonder if making a WU designed to use multiple cores to run one,
instead of one WU per core.
First and biggest objection, I know, is different wu types make more
work compiling different programs... makes it a non starter.
Just musing if the next project or some new project might
do better by requiring say 4 cores min but using up to, well I won't
hazard a guess at a number things are changing so fast...
I just know that when I go back to work, I'm saving up for either a
four socket machine (min 32 cores) or a 2 socket (24 to 32 cores).
I used to do a prime number search that would take a month (some years ago)
so I fear no large work unit, but primarily I look forward to harnassing
mulitple cores, or even multiple machines linked on my home network!
Hyper threaded i have 21 cores atm, (and 24 GIGS or RAM) imagine being able to download ONE
work unit designed to use the maximum memory and cores accross all platforms.
Ok, prob have to use all windows/linuz/mac to work on same work unit,
and prob all have to be 32 or 64 bit... Ok
Maybe I'll stick to ONE machine's cores as the next logical step in
distributed computing... It's fun to imagine! biggrin

----------------------------------------

[Jan 17, 2011 6:37:29 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Is the 12 Hr cut off the best limit?

Dear Jack007,
most of your suggestions are actually no problem in terms of the wus or Q-Chem. But customizations in DC are very hard! It just doesn't lend itself to squeezing out the last bit of performance for a million different computers. I think I said this before - simplicity is a quality in itself!

But thanks for your comments anyways. There are certainly still a number of doable improvements and we are always keen on learning what the users have in mind on this front.

Best wishes from the
Harvard CEP team

[Jan 17, 2011 8:17:09 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Is the 12 Hr cut off the best limit?

I don't know if this has been posted or not but.
If the way the data is used is, step 1 is done, that result feeds into step 2, which completes, which feeds into step 3 etc etc...

Why can't we take those wu's that maybe only have completed say 8 steps, and then send that out again to another machine to crunch the rest from 9 to whatever? In essence we have our 12 hour time limit, and we do what we can with that time, but, now, what was not done, the remainder from the last result is sent out again to 'finish off' the task for the full data set.

This would not only get you a more complete data set, but could also create the shorter running tasks some folks are wanting because there are fewer steps they need to do.

Just a suggestion.

Aaron

[Jan 25, 2011 12:34:34 AM]

nasher
Veteran Cruncher
USA
Joined: Dec 2, 2005
Post Count: 1423
Status: Offline
Project Badges:

1 year badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

2 year badge for Africa Rainfall Project


Re: Is the 12 Hr cut off the best limit?

from what i understand they normally get one computer (at least) out of the 2 completing all the way through and the partial is used to verify the results.

if both end early they will still have enough data to determine if they want to do more research on that WU.. in that case they run it in house instead of creating a specific special WU and sending it to us.

as for chopping the WU's to 2 blocks instead of one... I do not agree again the high end computers normally get the work done within 8 hours and the slower ones end at 12 hours.

not to mention if they did multiple stages (like DDDT2) they would probably run into the same problem lots of dry spells while they wait to be ready to release the next set of results.

as it is right now they can run lots of results and if there is a gap in there data they can run that gap in house

with dddt2 they end up havering to wait till a large set of work is done and then they can make the next sets.

Personally i like how this project works and i am glad to know that even if one of my machines doesn't complete a job that the results are very useful and there wont be a data gap cause someone else probably finished it

...
Another reason not to do it in multiple jobs is size.
For instance if the results from phase 7 needed for phase 8 are larger than the result needed to be sent back there is a lot larger data files. its kinda like if you have a math test they give you the question and you need to give them the answer you get (may be a single # or a small formula) so the result they need is small.. if they do a stop send back make next WU and put it out they have an question and lots of work steps listed to be sent to there computer as well as sent out again to the next

----------------------------------------

[Jan 25, 2011 1:03:24 PM]

littlepeaks
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 748
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

90 day badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

180 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

90 day badge for Computing for Sustainable Water

180 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

1 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

2 year badge for OpenPandemics - COVID-19


Re: Is the 12 Hr cut off the best limit?

I am still running my old Pentium 4 (sigh) / Windows XP and only doing CEP2 right now. Personally, I would like to see the WUs run to completion -- when I run FAAH, many of those WUs are very long. So, don't see what the difference is. My PC usually runs out of time in job 14, and I think I lose the time that it has crunched on the last job that didn't finish. So that much is wasted on my PC (and others in the same boat). I vote -- go for longer WUs.

[Jan 29, 2011 6:34:16 AM]

kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:

14 day badge for Nutritious Rice for the World

10 year badge for The Clean Energy Project - Phase 2

10 year badge for Mapping Cancer Markers

2 year badge for Outsmart Ebola Together

90 day badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

90 day badge for OpenPandemics - COVID-19


Re: Is the 12 Hr cut off the best limit?

Huh?!?!? I just got credit for 21 hours CPU time on a CEP2 WU. ????

E201034_ 876_ A.29.C21H12N6S2.284.2.set1d06_ 2-- kate-jetway64 Valid 1/28/11 13:10:22 1/29/11 16:58:54 21.03 102.3 / 63.8

This WU had problems -- just as job #2 was about to finish, BOINC tried to contact the Internet when my connection was down, which bounced it back to the start of job #2. Then something else happened much later. Here's the end of the log.

[09:13:29] Qink name = drvman
[09:19:52] Qink name = optman
[09:19:52] Qink name = anlman
[09:24:46] End of Job
[09:24:47] Finished Job #2
[09:24:47] Starting job 3,CPU time has been restored to 40500.690000.
Killing job because cpu time limit has been exceeded. 40500.690000||35220.580000||0.000000
[ERROR] Failed to open either source or destination files while copying A.29.C21H12N6S2.284.2.bp86.svp.n.bp86.svp.n.sp/stdout.txt to A.29.C21H12N6S2.284.2.bp86.svp.n.bp86.svp.n.sp.out. Error: 2
[09:24:47] Finished Job #3
09:24:50 (14171): called boinc_finish

</stderr_txt>
]]>

This is a very slow machine (D525 Atom). I don't have it run much CEP2, but when it does, it's always stopped at 12 hrs, usually around job #8. I think thus WU could have been in progress for 21 hrs. Obviously I was not watching it -- I knew about the Internet connection interruption because I had to get the connection re-established when I came home.

----------------------------------------

[Jan 29, 2011 8:09:35 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Is the 12 Hr cut off the best limit?

This doesn't really look like a valid result. CEP2 WUs all contain 16 jobs and this WU ended in job 3. I had said earlier that the cut off should be closer to 16 hours as my experience has shown my laptop gets to job 14 or 15 before ending due to excess time being used. I know that it was said the later steps were not really required but are used to verify the results of previous jobs (did I get that right?). If this is the case, why bother running the later jobs? If those jobs do have meaning, I would like a longer maximum execution to allow completion of those jobs. Perhaps an option could be provided (such as the Unlimited specification for Project Specific Settings.

[Jan 30, 2011 3:48:14 AM]

anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

20 year badge for Help Fight Childhood Cancer

50 year badge for Help Cure Muscular Dystrophy - Phase 2

50 year badge for The Clean Energy Project - Phase 2

10 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

10 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

200 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

200 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: Is the 12 Hr cut off the best limit?

dkt, kateiacy's looks like a valid job (except for it took 21 hrs). We only see up to job #3 because the time limit finally expired. Also job 14 & 15 are useful, they are not to validate other results, they are more information.

here is an analogy to help you understand. Say you are looking for a date.

the 1st 3 jobs (jobs 0-2) gives you the most important information about a the candidate
Job 0 tells you the person's gender. (easy to get)
Job 1 tells you the person's approximate age. (easy to get)
Job 2 tells you if the person's single (takes more effort, need to actually talk to the person, which sometimes is a LOT more effort)

the next 9 jobs (job 3-11) gives you some more useful information, and are easy to get once you did job 0-2
Job 3 tells you their name
Job 4 tells you their hobby
Job 5 tells you their job
Job 6 tells you their favorite movie
Job 7 tells you their favorite food
Job 8 tells you they like to do in their free time
Job 9 tells you their favorite color
Job 10 tells you their likes
Job 11 tells you their dislikes

the next 4 jobs (job 12-15) give you more detailed information, but are much harder to get
Job 12 tells you their phone number
Job 13 tells you their address
Job 14 tells you their birth date
Job 15 tells you their ring size

so you see, all of the jobs reveal important information, but if you are looking to date a single female around the age of 20, and the candidate (WU) you are looking at is male and 44, then you don't need his phone number.

CEP2 send out all of these WU looking to match a certain profile. The first 3 jobs tell them what if there is a match or not. If a WU is a match for the profile and not all jobs are finished, then the harvard team will re-crunch them in-house

edit : I am wondering if it took 21 hours because job 2 took so long. If you look closely, once it got to job 3, it quit right away. So it might be you have to do jobs 0-2 before you can quit (even if it takes longer then 12 hrs)

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by anhhai at Jan 30, 2011 7:08:12 AM]

[Jan 30, 2011 7:05:51 AM]

[ ]