World Community Grid - View Thread - Questions about participation drive for CEP2

World Community Grid Forums

Category: Completed Research

Forum: The Clean Energy Project - Phase 2 Forum

Thread: Questions about participation drive for CEP2

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 152

[ ]

Author

This topic has been viewed 497174 times and has 151 replies

Bearcat
Master Cruncher
USA
Joined: Jan 6, 2007
Post Count: 2803
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Questions about participation drive for CEP2

Both quad core and 8 core (no HT) is running Ubuntu 10.10. The quad has 8GB ram while the 8 core has 4GB ram. My 8 core/16 thread went back to win7 which has 12GB triple ram channel. The others are running in dual channel. Thought about filling the rest of the slots for quad channel but not worth investing in the older technology. Trying to sell both of my harpertown pc's to build a SB pc.

----------------------------------------

Crunching for humanity since 2007!

[Jan 18, 2011 11:30:42 PM]

Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

1 year badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project


Re: Questions about participation drive for CEP2

Ingleside,

If this works and provides improved efficiency, great, but we tried similar on Linux to link files x-slot and it failed, corrupting results. knreed commented that it was in contravention of BOINC slot-isolation design.

Hmm, if not mis-remembers, the linux-approach tried to be "clever" by linking any non-zero files that was equal under /slots/ and possibly the whole BOINC data-dir. But it's possible one (or more) temporary file, appart for /qcaux, is equal at task-start, and if got linked any differences later-on in the run would lead to corruption...

Looking forward to hands-on reports. Anything that also could reduce the ''used'' and ''reported'' BOINC dir data store could possibly also help the improve the RAMDisk approach.

Using linkd won't decrease disk-usage.

----------------------------------------

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

[Jan 18, 2011 11:52:17 PM]

Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:

90 day badge for Discovering Dengue Drugs - Together

20 year badge for Help Fight Childhood Cancer

14 day badge for Influenza Antiviral Drug Search

20 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

20 year badge for Computing for Clean Water

10 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria


Re: Questions about participation drive for CEP2

Dear Hypernova,
We are sorry to hear about the problems you experienced. We had a few server glitches in the beginning which we fixed back then, so your upload problem may already be ironed out by now. If not, please try our current beta test, where we replace HTTP with HTTPS.
It would be helpful for us and the IBM team is you could specify your ‘simply error out’ (in case it still persists) – maybe there is something we can do about or at least learn from it.

Best wishes,

Your Harvard CEP team

I should have a new rig (Luna) which should start this coming week-end. I will experiment again with CEP2 and let you know what happens.

----------------------------------------

[Jan 20, 2011 11:22:54 AM]

Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:


Re: Questions about participation drive for CEP2

Luna has started and is in operation now, crunching CEP-2 exclusively.
I have a 980X running 12 threads at 4.2 Ghz. The rig has 4GB of DDR3 running at 1900 MHz.
I have already 15 Valid WU, and another 4 in PV mode. The WU's are all of the 635 version. They have a runtime per WU that varies between 3.8 hours and 5 hours. One has lasted 6 hours. Too early for big stats but it looks promising as no errors for now. smile

I got some weird messages but it has not interrupted the crunching nor generated errors so I ignore them.
I will continue keeping this machine running exclusively CEP-2 and I will do a new stats update in about 10 days when the flow will have stabilized with the WCG servers.

----------------------------------------

[Jan 22, 2011 7:04:55 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Questions about participation drive for CEP2

Something changed. Whatever ''drives'' this, the daily number of tasks stayed fairly equal in the past weeks, but the average runtime dropped by about 1.25 hours per task and it wont be Luna doing this ;>). The shorter completion times translates to less runtime years overall being contributed, which has dropped from 15 to 13 per day. Fractionally some of that may be the DDDT2 rains as diversion, but can't see that as the main cause.

---Date---------Hrs/Tsk--Tasks/Day
JAN.01-11	8.88457	14,589
JAN.02-11	8.88461	15,125
JAN.03-11	8.96143	15,384
JAN.04-11	8.98109	14,581
JAN.05-11	9.00964	13,954
JAN.06-11	8.98414	13,472
JAN.07-11	8.90282	13,635
JAN.08-11	8.90202	13,230
JAN.09-11	8.93609	13,926
JAN.10-11	8.86320	14,571
JAN.11-11	8.84945	14,539
JAN.12-11	8.80440	14,308
JAN.13-11	8.79029	14,644
JAN.14-11	8.86567	14,748
JAN.15-11	8.71518	15,167
JAN.16-11	8.68457	15,103
JAN.17-11	8.53599	15,380
JAN.18-11	8.15219	15,313
JAN.19-11	7.91847	15,641
JAN.20-11	7.77700	14,671
JAN.21-11	7.61555	15,022

[Jan 22, 2011 7:25:02 PM]

anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:

14 day badge for Nutritious Rice for the World

50 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

50 year badge for The Clean Energy Project - Phase 2

10 year badge for Computing for Clean Water

10 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

200 year badge for Uncovering Genome Mysteries

200 year badge for Outsmart Ebola Together

100 year badge for FightAIDS@Home - Phase 2

100 year badge for Smash Childhood Cancer

200 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: Questions about participation drive for CEP2

Sekerob, the reason is simple. Clean energy recently (within the last 2 weeks) said that their current batch of WUs are shorter run jobs. Secondly, shorter run jobs + limits on how many CEP2 WU can be on each system means less runtime period.

Personally on my 16 thread system, I have it set for a 6 CEP2 WU limit and my cache is 1.2 days. So no matter how short or long the CEP2 WU's are, I will only crunch 6 per 1.2 days on those systems. I choiced 6 because most WUs were taking 10-12 hrs for me. Now I noticed a lot of WU finishing up in 6-8 hrs.

----------------------------------------

[Jan 22, 2011 7:45:17 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Questions about participation drive for CEP2

Got a link for that statement of "shorter"? Was it cleanenergy or one of the techs posting this notice? It thoroughly escaped me but for going over the stats and seeing these shorter run time averages...

And time for you to up the number to cache, since you read it ;>)

[Jan 22, 2011 8:01:53 PM]

anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:


Re: Questions about participation drive for CEP2

Sekerob, apparently I lied/mistyped. It was actually a little over a month ago that cleanenergy stated this little fact. See snippet below along with the link.
I guess I can increase the limit up from 6 to 7, I usually didn't see any noticable difference in performance (wasted cpu time) when I was crunching 3 at a time vs 1 or 2. My goal was to keep it limit my machines to just crunching a max of 3 at a time.

Disc and memory requirements also increase with system size. Our library currently contains about 10 million candidate molecules which are for now sorted by size. They get sent to the grid in (more or less) continuously ascending order (the current max size is 499 of some complicated unit).
The next batch of 2.7 million candidates is made up of recently generated molecules of a size between ~250 and 499. Hence, there will be smaller ones again which are out of order but will go up to the same size as the currently biggest molecules. So there will be no big jump in size anytime soon. We will be able to make some more quantitative statements when we compile our internal statistics early next year.

http://www.worldcommunitygrid.org/forums/wcg/...ad,30547_offset,20#307597

Edit: Instead of increasing my CEP2 WU from 6 to 7, I just decrease my cache size a little bit.

----------------------------------------

----------------------------------------
[Edit 1 times, last edit by anhhai at Jan 22, 2011 8:57:11 PM]

[Jan 22, 2011 8:42:37 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Questions about participation drive for CEP2

Thanks, now I remember that bit... so it will creep up again, but with 2.7 million (before quorum duplication?), at 15k units daily, that will be a long way to go.

Increase your cached number by the factor the jobs run shorter and you'll likely maintain the same "concurrency". Could probably plot that in a curve using 1 core and cache size and then multiply by the number wanted to run concurrent. It seems in practice that the scheduler is tuned to priority backfill any CEP2 shortfall on the next work-fetch calls.

Sample: Cache 1.5 days. Run time 8 hours: 36 hours / 8 = 4.5 * 2 Cores (half of quad), makes it 9 **. Has to be mixed with sciences that are not limited in supply. Not tested this, but over time this could balance out. My 1.2 day cache worked with a setting of 5, to get a quasi 2 out of 4 on the quad, but that was with longer running work units. Then had to cut it to 4, but that led also to periods that no CEP2 ran, so then just manually kicked them from the rear (but only when checking in a few times a day), which then on completion hastened the backfill again.

** As choice in the New [CEP2] Project Setting

edit: added ** reference

----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 23, 2011 1:46:47 AM]

[Jan 22, 2011 9:26:08 PM]

Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:


Re: Questions about participation drive for CEP2

I use 3 day cache as a standard for all machines and all projects.

----------------------------------------

[Jan 22, 2011 10:03:21 PM]

[ ]