World Community Grid - View Thread - Beta Test - Outsmart Ebola Together - v7.14

World Community Grid Forums

Category: Beta Testing

Forum: Beta Test Support Forum

Thread: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 99

[ ]

Author

This topic has been viewed 354326 times and has 98 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Got 4 of the batch 929 rigid units. Switched LAIM off and suspended each at a checkpoint after they'd all checkpointed at least once. One is validated already so they're looking good.

Having seen how these run, and if they're typical of what we might get in production, I've concluded that I'm not so bothered about the occasional long checkpoint or the iffy progress display. It's not good, and we might have to remind a few people that things are not perfect, but the techs have got it working and it's far more important that we get on with the science.

Well done, guys!

[Jan 8, 2015 11:51:48 PM]

deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4894
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

90 day badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

100 year badge for The Clean Energy Project - Phase 2

10 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

100 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

200 year badge for FightAIDS@Home - Phase 2

200 year badge for Smash Childhood Cancer

200 year badge for Microbiome Immunity Project

200 year badge for Africa Rainfall Project

200 year badge for OpenPandemics - COVID-19


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

The 308s are checkpointing and finishing, so lets get this going.

[Jan 9, 2015 1:54:18 AM]

KWSN-A Shrubbery
Senior Cruncher
Joined: Jan 8, 2006
Post Count: 476
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

10 year badge for Help Cure Muscular Dystrophy - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Was away when these went out so I wasn't able to battle test them. Nine pages valid or pending.

Looks like a go.

----------------------------------------

[Jan 9, 2015 2:03:05 AM]

slakin
Advanced Cruncher
Joined: Jul 4, 2008
Post Count: 79
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project

45 day badge for Africa Rainfall Project


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I thought it was a bit strange, after suspending a restarting a work unit I happened to check the work unit properties and found that the CPU time and checkpoint time were greater than the elapsed time ..no other issues.

Beta_OET1_0000299_xEBGP-L_rig.004
CPU time at last checkpoint 00:12:21
CPU time 00:13:04
Elapsed time 00:05:03
Estimated time remaining 00;34:31
Fraction Done 60%

I didn't get an opportunity to see if this happened on any other units.

[Jan 9, 2015 2:23:43 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

The Result Log is still disconcerting at a restart. This one ran for 38min before I restarted it (LAIM off), said "Starting task 0,CPU time is 0.000000" at the restart, then ran just 18min to completion (so I assume it actually restarted correctly from a checkpoint). It ran as Quorum 1 and went Valid. The research result may be good, but this behaviour is likely to cause more forum queries from crunchers who come across it.

Result Name: BETA_ OET1_ 0000299_ xEBGP-L_ rig_ 0277_ 0--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[08:02:15] Number of tasks = 1
[08:02:15] Starting task 0,CPU time is 0.000000
[08:02:15] ./ZINC01563325.pdbqt size = 32 7 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-L_rig.pdbqt size = 2470 0
[08:40:31] Number of tasks = 1
[08:40:31] Starting task 0,CPU time is 0.000000
[08:40:31] ./ZINC01563325.pdbqt size = 32 7 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-L_rig.pdbqt size = 2470 0
[08:58:15] Finished task #0 cpu time used 3222.906750
08:58:15 (5804): called boinc_finish

</stderr_txt>

[Jan 9, 2015 9:57:23 AM]

Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3315
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

90 day badge for Drug Search for Leishmaniasis

90 day badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

2 year badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

14 day badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

My 298 finished and validated but checkpoints are simply too far apart.
45 minutes is way too much.

----------------------------------------

- AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
- AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
- AMD Ryzen 7 7730U 8C/16T 3.0 GHz

[Jan 9, 2015 10:35:55 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

My 298 finished and validated but checkpoints are simply too far apart.
45 minutes is way too much.

In an ideal world I would agree with you, but at least this isn't as bad as CEP2 which can go for hours between checkpoints.

It would be good if it was more often, but at this level I doubt we're going to get the situation where machines are turned on and off again before a checkpoint occurs. If the WUs get longer, or if the techs feel it's not too onerous to do, I'd like to see more checkpoints as well. But I personally feel that we can live with it at this level.

Just my 2p'th.

Edit: spelling/grammar.

----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 9, 2015 10:43:52 AM]

[Jan 9, 2015 10:42:25 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Eventually in this situation of 1 job flex, forced checkpointing, a choice will have to be made how often [or there are still technically opportune moments in the simulation], not to inundate a 4-8-16 core machine with checkpoint saves. OTOH, if the checkpointing for flex jobs listens to the write to disk setting, skipping a 'forced' checkpoint by the fast would be ideal. 'Too much' at 45 minutes is of course massively better than never in 48 hours smile

. Anyway, no issue here since using hibernation extensively with only the monthly boot for Windows and being bootless on Linux using KSplice. biggrin

[Jan 9, 2015 10:52:49 AM]

Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3315
Status: Offline
Project Badges:


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

Indeed 45 minutes is better than CEP2 or no checkpoint at all.
I may hibernate but many don't. I am just afraid there may be lots of hours lost across the grid with these kind of tasks.

----------------------------------------

- AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
- AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
- AMD Ryzen 7 7730U 8C/16T 3.0 GHz

[Jan 9, 2015 12:11:40 PM]

I need a bath
Senior Cruncher
USA
Joined: Apr 12, 2007
Post Count: 347
Status: Offline
Project Badges:

14 day badge for Help Cure Muscular Dystrophy

1 year badge for The Clean Energy Project

1 year badge for Influenza Antiviral Drug Search

10 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

10 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

20 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project


Re: Beta Test - Outsmart Ebola Together - v7.14 - Jan 7, 2015 [ Issues Thread ]

I have noticed that Beta units take up nearly as much CPU resources while that are "waiting to run" as when they are actually running.
Is this a problem?

----------------------------------------

[Jan 9, 2015 7:05:40 PM]

[ ]