World Community Grid - View Thread - Most cores to crunch this project

World Community Grid Forums

Category: Completed Research

Forum: The Clean Energy Project - Phase 2 Forum

Thread: Most cores to crunch this project

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 22

[ ]

Author

This topic has been viewed 4922 times and has 21 replies

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy

1 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

10 year badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

2 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Most cores to crunch this project

No.
This is a "feature" of WUs of this project. crying

Each WU is made of 16 jobs of various durations, and at each job change some amount of CPU time is lost for some mysterious reason.

This problem has been reported early during beta testing this project but, although it might have been slightly improved, it has never been eliminated completely. After checking the most recent CEP2 jobs that have been returned by my quad the loss is an average 20 minutes per job, 15 for a very short one (4h30) and 30 for a very long one (>10h).

Your system is not the problem. Actually the difference is not due to some overhead, the CPU time total is really going backward between the end of a job and the beginning of the next one!
I have closely watched scores of CEP2 jobs via BoincTasks to come to this conclusion. crying

Edit: Emphasized the key point of this post...

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

----------------------------------------
[Edit 1 times, last edit by JmBoullier at Oct 26, 2011 2:21:38 AM]

[Oct 25, 2011 8:48:08 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Most cores to crunch this project

Just to clarify this a little bit: CPU time is always less than WC time, because not all work done by your computer is CPU based. I/O is a prominent reason - but obviously a necessary evil (after all we have to save the data of a calc to do something with it in the end). Other use of the CPU (e.g., by the OS or other software) also adds to the discrepancy between CPU/WC time. If there are a number of simultaneous wus which all try to perform I/O, they have to basically wait for each other. In this case, the CPU time stands still while WC time moves on, which leads to bigger discrepancies. The more wus run simultaneously, the higher the probability that there will develop some I/O holdup.
Best wishes
Your Harvard CEP team

[Oct 25, 2011 2:58:28 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Most cores to crunch this project

Can we compensate that decrease by adding hard disk / SSD in raid 0 ?

[Oct 25, 2011 4:51:24 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Most cores to crunch this project

Got it running in ''virtual'' SSD on Linux using the zramswap-enabler. This way it only goes to HD when checkpointing. Whilst, the amazing thing continues to be that when running CEP2 (and DSFL, both using a staging wrapper construct), run 99% efficient with a regular HD on Windows, booting same device to Linux, I'm barely getting 95% [and only when really hands-off] and only if not running more than 2 concurrent on a 4 core system. DSFL then has a hard time getting 98%, not 99.5% seen under W7. The Linux setup is truly bare, no GUI. I've actually launched a query over at the Ubuntu support forum to ask about fstab parms that could optimize. noatime, relatime, write-back delay etc. Nothing seems to be making a dent in the poor efficiency, to repeat, only under Linux with the 3.0.1 kernel.

--//--

[Oct 25, 2011 5:02:23 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Most cores to crunch this project

Had a quick reply. It was suggested to give Linux it's true own partition or drive and not the way I've been running it from the get/go, it sort of sitting inside an NTFS drive, where from the LiveCD installation, it sort of marks off an area which it uses to overlay it's ext3/4 system.

2 links provided:

http://www.phoronix.com/scan.php?page=article...buntu_wubi_1010&num=1
http://www.phoronix.com/scan.php?page=article...inux_2638_large&num=1

--//--

[Oct 25, 2011 5:32:12 PM]

Bearcat
Master Cruncher
USA
Joined: Jan 6, 2007
Post Count: 2803
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

2 year badge for Nutritious Rice for the World

10 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

5 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

5 year badge for Computing for Sustainable Water

10 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Most cores to crunch this project

Thanks for the replies. If I wasnt so close to getting sapphire, would try a second hard drive for boinc only. This way, both spin up for their own reason, instead of sharing the disk. I increased my write times to 120 seconds yesterday to see what happens. Same or worst. I have an older 30gb ssd drive I will try later just to see what happens. It was cheap so if it dies, no big deal.
One thing I do though is leave one thread free for other processes needed. Am hoping doing this wont interfere with boinc.

----------------------------------------

Crunching for humanity since 2007!

[Oct 25, 2011 9:08:57 PM]

sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

5 year badge for Nutritious Rice for the World

20 year badge for Help Fight Childhood Cancer

20 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

5 year badge for FightAIDS@Home - Phase 2

45 day badge for OpenPandemics - COVID-19


Re: Most cores to crunch this project

One thing I do though is leave one thread free for other processes needed. Am hoping doing this wont interfere with boinc.

I do this on a few systems; 7/8 threads used, but I also run FreeHal and WUProp. Overall Processor usage is ~94%, varying from 91% to 97%. 7/8 threads in itself is 87.5% of the overall CPU, so at most I am really only losing around 1/2 of one thread. As well as improving system responsiveness, I think it reduces errors, and might expedite the other threads, reducing the loss even further. On 12thread systems the relative loss would be less, >5% CPU loss. For CEP2, I think it would also better facilitate disk I/O.

[Oct 25, 2011 11:35:43 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Most cores to crunch this project

Dear warmachine,
yes, multiple harddrives are successfully used to eliviate I/O bottlenecks. We recently extended our inhouse cluster with nodes set up this way. But we certainly cannot ask WCG users to buy additional hardware just to maximize the CEP2 return.
Best wishes
Your Harvard CEP team

[Oct 26, 2011 4:11:53 PM]

sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:


Re: Most cores to crunch this project

On systems with 12+ threads, a faster HDD would improve the performance of most WCG projects - It's just that CEP2 would benefit the most. Elsewhere in the Boinc world ClimatePrediction is in the same situation. Anyway, people building such systems will probably want a good hard drive, or several.
Installing Boinc so that it uses a different HDD than the operating system helps, as does using an SSD (though they are prone to failure), and obviously using several drives in a RAID format would improve things further.
Ultimately it's about getting a well balanced system with good specs not just for the CPU but for the RAM, motherboard/chipset, and increasingly for many projects, high performing HDD's. In the future GPU capabilities will become more important.
GL

[Oct 26, 2011 7:02:32 PM]

JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:


Re: Most cores to crunch this project

@cleanenergy
It seems you missed (or ignored) my post on Oct 25, 2011 8:48:08 AM where I finished with the following:

Your system is not the problem. Actually the difference is not due to some overhead, the CPU time total is really going backward between the end of a job and the beginning of the next one!
I have closely watched scores of CEP2 jobs via BoincTasks to come to this conclusion. crying

Tonight I have been happy enough to be able to watch the end of job #2 of the CEP2 WU which is currently running in my quad. I watched it via BoincTasks running in my netbook, so I did not interfere with the quad myself and CPU efficiency was about 99 % on all four tasks (3 HCC WUs running along). Here is what I could see before and after the checkpoint between job #2 and job #3:
Before:
Difference between wallclock and CPU times: 0:03:35
Last CPU time shown by BoincTasks: 3:29:17
Just after:
Difference between wallclock and CPU times: 0:15:00
CPU time shown by BoincTasks: 3:18:00

My BoincTasks is refreshing every 15 seconds, so all these changes happened during an interval of 15 seconds.

Of course, at the same time BoincTasks started to show a blank field for CPU efficiency until the CPU times become consistent again over its computation interval.

I think that this is confirming what I said earlier: chasing I/O overhead to get rid of the big differences between wallclock and CPU times for this project is not the point (although it cannot hurt smile

). The problem is something wrong in the master procedure which manages the 16 jobs of a CEP2 WU.
But what ? confused

----------------------------------------

Team--> Decrypthon -->Statistics/Join -->Thread

----------------------------------------
[Edit 1 times, last edit by JmBoullier at Oct 26, 2011 11:45:48 PM]

[Oct 26, 2011 11:41:28 PM]

[ ]