World Community Grid - View Thread - Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

World Community Grid Forums

Category: Beta Testing

Forum: Beta Test Support Forum

Thread: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 114

[ ]

Author

This topic has been viewed 22644 times and has 113 replies

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Surely there must be a better way than to just throw 18 hours of work at a time away and give the client no credit? Can't we let the client run until it finishes the first job?

That would work if it is only a "long" WU." If the program is in an endless loop or is on a diverging path it would never end. The 18 hour limit is there as a stop gap measure in those cases. I believe the 18 limit was also put in place because the scientists figured this would be sufficient for most of the machines to return a meaningful result most of the time. The fact remains there are some systems which are going to be too slow for this project, but I presume they are few and far between. There are also some molecules which are too big for even the fastest consumer grade machines and these would then get kicked over to the the scientists' own workstation cluster.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Mar 9, 2016 6:13:01 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

So the project was made explicit opt-in for various technical reasons yet when the opt-in is done and the comp is not hacking it, there continues this stream of extend requests. Just why can't we accept this is it, after half a decade running?

----------------------------------------
[Edit 2 times, last edit by SekeRob* at Mar 9, 2016 10:18:58 AM]

[Mar 9, 2016 9:20:11 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Sek, I think you might (just might) be being a bit too simplistic. Some things have changed.

For a start, this is a beta so presumably new molecule (types?) are being explored, so the boundaries are being pushed. We don't know if it's the type of molecule that takes a lot of processing, or if the algorithm is not converging well (or whatever it does) with these new molecules.

I also think that the processor power spread of user computers is changing as power gets turned down and CPU count goes up, so as to consume less electricity. As I said in an earlier post, a time-limit is a very crude brake. I personally think it would be better to use the number of actual processing steps, but that might need too many changes or too long a processing time on the slowest machines to be practical.

At the end of the day we should all just let the scientists and the techies do their bit and decide the right way to go. But I don't see anything wrong with crunchers expressing opinions. What we say may influence decisions and that's how it should be.

[Mar 9, 2016 11:03:57 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

You slam dunked rephrased my Too Simple perfectly in just these few words "let the scientists and techies do their bit and decide the right way,." They got the statistics for thousands of results and the [new/adapted] goals set. Accept it.

[Mar 9, 2016 11:22:56 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Sgt. Joe, the endless loop prevention is actually build into all project setup files in the <rsc_fpops_bound> parameter. Usually WCG has it set at 10x estimated fpops of the current distribution. Given the high variability of task duration, this setting could actually lead to a premature kill / max_time_exceeded. Suppose the current mean would be 1.5 hours... then the task could be killed at 15 for an average performing device. Have no CEP2 production or beta to see what the limit is set at, but probably the app was hard-coded to have them die at 18 no matter the device speed.

[Mar 9, 2016 12:27:43 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Sgt. Joe, the endless loop prevention is actually build into all project setup files in the <rsc_fpops_bound> parameter.

Thanks for the clarification. The scenario you give could certainly occur, but it would be hard to envision the current mean dropping to such a low level. I believe you are correct that the 18 hour limit is hard coded into the program, thus a fail safe mechanism to a runaway condition. That some WU's hit this limiter and do not finish even Job 0, is unfortunate, but that is the nature of basic research. Even the failures impart some knowledge to the researchers.
Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Mar 9, 2016 1:38:11 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

the endless loop prevention is actually build into all project setup files in the <rsc_fpops_bound> parameter

probably the app was hard-coded to have them die at 18 no matter the device speed

Well, I can only assume they know something we don't, as this seems weird!

[Mar 9, 2016 2:33:14 PM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Not in the least, as apps can be configured to listen or not listen for a range of parameters. As said, I don't have a CEP2 on a system, but will not be surprised it is set to the WCG standard 10x and the logic working along the line of second safety... Die at 18 and otherwise die at max_fpops_bound. Can't remember to have ever read about one that did go past the 18 or in past when it was set to 12 hour limit.

[Mar 9, 2016 2:49:39 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

I say it's weird because I don't know of any algorithm which is time dependent. If it does a thousand cycles on a slow machine and is killed after a time limit, why should I let it do a million on a faster machine? What would that buy me?

But maybe they know something I don't.

[Mar 9, 2016 9:11:53 PM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: Clean Energy Project - Phase 2 Beta Feb 24, 2016 [ Issues Thread ]

Actually you do know the answer, which is not technical.

[Mar 9, 2016 9:33:34 PM]

[ ]