| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 34
|
|
| Author |
|
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges:
|
If "no heartbeat" is a frequent problem, it might be worth looking into (in order of easiest) putting in an additional drive to use for BOINC data and/or going to RAID. Well, even with BOINC data-directory (or only slots-directories) on 2-disk RAID & OS on separate disk I've little problems generating "no heartbeat" if starts with any disk-heavy operations. Even if left computer alone the RAID-setup on an i7-920 with 6 GB ram running 8 CEP2 gave "no heartbeat" after a couple days. Have also tried pre-starting CEP2, similar to Rickjb recommended, and while don't remember the exact number but AFAIK around 20 unzipped CEP2, hitting the disk-tab was enough to trigger "no heartbeat". This even happened after all CEP2 was removed from memory and was running some other work, clearly showing it's the number of files what's the problem and not how many GB they're using. BTW, even without RAID, I've had no problems (*) running CPDN, even these also uncompresses around 350 MB of data on 1st. start of model. The big difference is CPDN uses only roughly 100 files, not 6466 like CEP2 does. (*) I'm not counting problems due to buggy wu's and so on. ![]() "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
I thought I might get some response from my input re. the infrequent checkpoints from CEP2 still deterring many members from running it, but the thread seems to have wandered a bit OT.
"Infrequent checkpointing" is a bit OT too, but it does discuss "Why [am I] asking? The contribution for this science is positively dismal compared to what it could be...", stated by Sekerob in the head posting. [OT]I've never dedicated a majority of my machines' cores/threads to CEP2 because (a) I want to support other sciences too, and (b) running many instances of CEP2 is known to slow down the whole machine. This could be because many parts of the O/S (Windows etc) file I/O system are single-threaded and running too many CEP2 threads causes bottlenecks. If you run other sciences on say 50% of your CPU cores/threads you may find that the "no heartbeat" issues just go away, and the machines will also be more productive .[/OT] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I thought I might get some response from my input re. the infrequent checkpoints from CEP2 still deterring many members from running it, but the thread seems to have wandered a bit OT. "Infrequent checkpointing" is a bit OT too, but it does discuss "Why [am I] asking? The contribution for this science is positively dismal compared to what it could be...", stated by Sekerob in the head posting. lolll...yes, I wandered "off-topic" if your intent was strictly to push the CEP2 project...well, really to push the corporation that wrote the analysis core (Q-Chem, which performs the quantum mechanics calculations) of this phase - 2 - of the CEP project to change their code... Well, from my personal experience with both writing code and with dealing with other corporations' software (and their lead product, at that)...I would venture that is going to be tough to get done. Unless you pony up...oh, say half a million dollars - at a guess. (But I don't reckon either Q-Chem or Harvard would be averse to the idea...assuming the rewrite could be completed, tested, and rolled out while there was still sufficient work units left to make it worthwhile.) My apologies for "wandering a bit OT"...I tend to focus upon the pragmatic...the deliverables. ![]() (Edit: I'd add that I was addressing This could be because many parts of the O/S (Windows etc) file I/O system are single-threaded and running too many CEP2 threads causes bottlenecks. If you run other sciences on say 50% of your CPU cores/threads you may find that the "no heartbeat" issues just go away, and the machines will also be more productive . RAID or at least a separate data drive allows the multithreading capabilities of modern operating systems [Windows et al] to avoid having to queue I/O in a single thread. In the case of I/O contention, it doesn't matter if you're running just CEP2 or CEP2 and other sciences - if they demand I/O at the same time as another science or at the same time as the operating system, you get I/O collisions. Said collisions will, of course, be exacerbated if you have a single science starting out from 0% across multiple cores...which I don't have any issues dealing with using separate RAID sets for the system and data drives.)[Edit 1 times, last edit by Former Member at Feb 14, 2013 7:53:55 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My bottom line purpose - if it isn't apparent - is to make it clear that you can pack a modern CPU running a modern operating system chock full of CEP2 tasks without issues. (A side note: Do tell your antivirus to exclude your BOINC data directory and BOINC executable directory no matter what science you're running. An antivirus program that puts file locks on a necessary BOINC file can cause problems.)
O/T: You should run as much CEP2 as you can; curing cancer, for example, is all well and good, but eliminating the power plants that burn hydrocarbons or use fissionables with their attendant radioactive waste products and "accidents" in favor of solar energy will reduce the mutagens and carcinogens being pumped into our environment - and thus eliminate their ability to cause cancer....i.e., you've moved from curing cancer into cancer prevention. To use an analogy, I'd rather avoid being shot no matter how good the emergency room doctors are at patching GSW victims up these days. Given my druthers. So (he says petulantly) ignore all advice to run multiple sciences and run CEP2 round, bouncing objects to the wall! /end O/T |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear ibsteve2u et al,
thanks so much for your enthusiasm - we greatly appreciate it! We are very grateful for all the work you and all the WCG crunchers put into our project. We are definitely trying to attract more people to the project. We would, however, resist from ranking the value of the different research projects. They are all great and they all work towards the common good. We think that people should support whatever science they personally relate to - if that ends up being CEP that's awesome, if it's one of the other projects, that's great, too. It's not a competition for us. Dear Rickjb, yes, improving the checkpointing is still on our TODO list, but that list is so long and our manpower is so limited that we just have to prioritize, and there are other things that are just more pressing than the checkpointing. Best wishes from Your Harvard CEP team |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For those who do not want to get into any confusing micromanagement stress, heartbeat fears what not, users of app_config.xml should understand that there's a way to get newer tasks to the head of the queue in for instance a 2 science selection scenario [presumes a reasonably regular supply of work]. For instance, expanding the original example of 2 CEP2 on an octo, 1.5 day buffer, it's as simple as limiting all other sciences to <max_concurrent>6. Visit the OP post for the sample: http://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=411882 . The Sky is the limit, particularly if we would get the function to read in the app_config.xml [like we can with cc_config.xml] and have on the fly adjusting, rather than having to restart the client [going to see if this can be pushed ahead with devs.]
----------------------------------------[Edit 1 times, last edit by Former Member at Feb 18, 2013 10:08:20 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear ibsteve2u et al, My apologies; I will temper my efforts with your more...neutral...perspective.We would, however, resist from ranking the value of the different research projects. They are all great and they all work towards the common good. We think that people should support whatever science they personally relate to - if that ends up being CEP that's awesome, if it's one of the other projects, that's great, too. It's not a competition for us. Best wishes from Your Harvard CEP team |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Can't draw enough pleasure out of the app_config.xml... doing 2 way science on the octo, 2 for CEP2 and 6 for HCC, MinB set to 1.5 days and MaxAB to 0.0, but the device profile to only allow 5 CEP2 to buffer, always rather fresh. What? you'd say. Well, when you config the file to only allow 6 concurrent HCC1 for CPU and let the others be filled out by anything else, which is either CEP2 or Beta, they'd always have first pickings of the remaining 2 cores. The app_config.xml can't be any simpler:
<app_config> <app> <name>cep2</name> <max_concurrent>2</max_concurrent> </app> <app> <name>hcc1</name> <max_concurrent>6</max_concurrent> </app> </app_config> No restrictions on Beta, so should 8 arrive, they can all have the joy on first opportunity when any of the slots come free [at least 2], with the limiting of buffered CEP2 rather an increased chance of that happening, all by itself. The buffer settings and HCC1 durations make sure that the client will want to backfill the cache about every 3 hours, short enough to replenish [first call] from the CEP2 feeder and top up with a remainder of HCC1, or BETA, even before asking the CEP2 hopper. Crunch On. P.S. CEP2 almost hit 20 years yesterday... 19.6 years. See http://bit.ly/WCGCE1 (If you like it, hit the Like button... increases the change of Photobucket and Social Media searchers hit on the project sooner... Thx) |
||
|
|
Yarensc
Advanced Cruncher USA Joined: Sep 24, 2011 Post Count: 136 Status: Offline Project Badges:
|
I've been running the app config on 3 computers for a week or so and it works great for this not having to micromanage all the CEP WU's. Thank you Sekerob as always for all your awesome charts, especially the pie chart with all the projects.
|
||
|
|
mmstick
Senior Cruncher Joined: Aug 19, 2010 Post Count: 151 Status: Offline Project Badges:
|
I had no idea the heartbeat thing existed. I've been running three Phenom II X6 machines with just a single hard drive and no SSD and computing six CEP2 work units simultaneously for a while now. My FX-8120 machine though is running HCC since it has a Radeon HD 7950.
I've returned to WCG after 9 months of distributed X264 10-bit encoding of my personal archive of TV series and movies. |
||
|
|
|