Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 22
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3966 times and has 21 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

Hi everybody,
we obviously wish and hope that you all stay with our project. The IBM team is aware that this issue is very much on the mind of our crunchers and they hopefully get around to resolving it soon. It is not in our hands.
From a scientific perspective the 12h-limit is not a big problem - the employed job sequence is an efficient structure for our wus, and it is nice if all the jobs finish, but we can live with incomplete work units as well. The wus were designed with this in mind (please check out the earlier thread for a more detailed discussion). After all, you start a new job after the 12h, so no overall computing time gets lost.
We would like to point out that most people still don't hit the cutoff time, and that - afaik - most projects have some sort of a time limit which is actually more restrictive than ours.
Best wishes
Your Harvard CEP team
[Sep 6, 2011 4:29:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
GIBA
Ace Cruncher
Joined: Apr 25, 2005
Post Count: 5374
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

Wonder if this project could have WU's more light, something which could be crucnhed in around 6 hours at the most.

It is a very important project, with very clear potential to help in short time the researches. But since was started in WCG, became the last one in the crunchers preferences due it restrictions (pre requisites limits, hard usage of machine mainly in upload/download conections, and the veryb long time to crunch the most of WU's in any kind of machine).

If possible change the scheme, I guess that the researches would take a look in the ways to do it asdap, and if done, for sure the project will gain a lot of adepts, and will increase very quickly your share in the total WCG participation and preference of crunchers. Think about it.

Wish look and best regards. thinking nerd idea praying good luck
----------------------------------------
Cheers ! GIB@ peace coffee
Join BRASIL - BRAZIL@GRID team and be very happy !
http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1

[Sep 6, 2011 5:06:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

DSFL also has some very long crunching times. I am seeing 9 to 10+ hours for most WUs on my 'puters. The elapsed time seems to be in the 13 to 16 hour range. This is quite a bit more than I have experienced with CEP2.
[Sep 6, 2011 5:27:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

Dear GIBA and dkt,
Our impression is that the "opt-in" status is the biggest problem for the project in terms of participation and support. Many casual users are scared or just don't want to be bothered with a project which is not entirely standard and hence decide to pass on it, although their hardware (except for very low-end machines) could easily handle CEP. We talked to the IBM team about this, but there does not seem to be a favorable resolution. We don't want to cause problems which may make people leave WCG entirely.
Ok, and now once more to the structure of the wus: We could have cut down the number of jobs but once calcs on a particular molecule are running on a host it is very efficient to keep it running there. The later jobs in a wu are not as important as the earlier ones (although they do improve our overall computational characterization), but the point is that they run in only a fraction of the time and without any extra network traffic when we run them in sequence within the wu. The 12h limit was introduced as a compromise for users who don't want to keep a job on their machine for too long and cash in on points regularly rather than in big chunks.
We have started working on the next generation of wus which will target different properties and which may have fewer restrictions...
Best wishes
Your Harvard CEP team
[Sep 8, 2011 4:31:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

Our impression is that the "opt-in" status is the biggest problem for the project in terms of participation and support.
That is misleading. Are you suggesting that it is that "opt-in" that explains why the participation and support for CEP2 -- if we measure in terms of uptake of CEP2 by WCG crunchers -- is the lowest among the currently active WCG projects? I seriously doubt it. If anything, my bet is that the removal of "opt-in" for CEP2, if that is what you're calling for, would open WCG, via CEP2 to an avalanche of valid and reasonable complaints from WCG crunchers.

Many casual users are scared or just don't want to be bothered with a project which is not entirely standard and hence decide to pass on it, although their hardware (except for very low-end machines) could easily handle CEP.
Another misleading statement. The above quote should instead read:
Many casual users are scared or just don't want to be bothered with a project which breaks down their hard disks prematurely (and at times, even an expensive SSD like the vertex2 can't cope), although their CPU and other hardware (except for very low-end machines) MAY easily handle CEP2.

We talked to the IBM team about this, but there does not seem to be a favorable resolution.
I can't blame IBM, nor WCG. These companies have been more than generous, almost to a fault, in accepting to host a "project which is not entirely standard".

We don't want to cause problems which may make people leave WCG entirely.
Intentions do not guarantee the results intended. CEP2 may have been the project that caused a number of crunchers to leave WCG, with their broken hardware, I mean, hard disks.

We have started working on the next generation of wus which will target different properties and which may have fewer restrictions...
Try this instead:
We have started working on the next generation of wus which will dramatically reduce demands on the volunteers' hard disk, and will offer crunchers a choice of runtimes.
;
[Sep 11, 2011 3:33:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

Dear andzgrid,
we are sorry to hear that you seem to be unhappy with CEP2 (although we don't quite see from your badges how you can claim much first-hand experience...). Please let us comment on a few points you raised:
Are you suggesting that it is that "opt-in" that explains why the participation and support for CEP2 -- if we measure in terms of uptake of CEP2 by WCG crunchers -- is the lowest among the currently active WCG projects?

Yes, indeed we believe that. Due to the ‘opt-in’, no cruncher who joined WCG before the CEP2 launch participates automatically and many may not think about or want to be bothered with actively changing their settings. Newbies on the other side usually want to get WCG going to test it out - they will often leave out the one ‘hard project with the warning sign’. Many of those may equally not come back later to opt-in to CEP2. So, unlike participant of other projects, CEP2 crunchers had to at some point decide to join and deliberately change the necessary settings, which many casual crunchers will realistically not do.
…the removal of "opt-in" for CEP2, if that is what you're calling for…

No, we are not calling for that. As we wrote, we contemplated this and other scenarios with our friends at IBM/WCG and came to the mutual conclusion to leave things as they are. But that also means that the relatively low CEP2 participation is still an open issue.
...a project which breaks down their hard disks prematurely (and at times, even an expensive SSD like the vertex2 can't cope)… CEP2 may have been the project that caused a number of crunchers to leave WCG, with their broken hardware, I mean, hard disks.

Your implied charge that CEP2 is a hard-drive killer is frankly silly. The Q-Chem code is a sophisticated research software which has been in development for 20 years and which runs in labs and on clusters around the world. No IT department would touch it, if it were a danger to their equipment. If it makes you feel better: CEP2 is also running on all the personal computers (including the laptop this post is written on) of the CEP team and our friends and families…
I can't blame IBM, nor WCG. These companies have been more than generous, almost to a fault, in accepting to host a "project which is not entirely standard".

Just to make this entirely clear: We certainly don’t blame our friends at IBM for anything either, if that's what you implied! They are a great team and we are extremely grateful for their help and support. They are awesome in accommodating our research and finding solutions for complicated problems. We could not wish for better partners.
Best wishes
Your Harvard CEP team
[Sep 12, 2011 4:46:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jack007
Master Cruncher
CANADA
Joined: Feb 25, 2005
Post Count: 1604
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

I must say I support almost nothing andzgrid has said, however,
I'm a little nervous about the HD thing. So for the 2 computers
that my wife and I use daily I don't run CEP2. But on the laptop
that is old and I'm not worried about, I've been running for a year
with no problems. I'm probably just paranoid. However I just put
together a new computer with an I7 2600K and 16 Gigs of RAM,
just so I could run CEP2 on a 10 gig RAMDISK. When I fix my heat
issues I plan to run 4 CEP2 at a time to start, possibly increasing to
6 or 8 as I see how they do.
While only running 2 at a time, they are running phenomenal!
3 to 5 hours a WU. My laptop always gets cutoff at 12 hours now.
(1.8 or 1.6 ghz I forget).
You guys are doing a great job, happy to play a small part.
----------------------------------------

[Sep 13, 2011 9:07:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

The extremely small kernel of truth behind the hard drive wear is simply that the hard drive is being used. Distributed computing in any form for any science will shorten the life of the system it is running on.

But this is negligible at best. Over 99% of the people will have discarded their computer for other reasons, such as being too slow, before they will see any impact from decreased life expectancy. As for SSD hard drives, there may be a concern with the increased read/write cycles, but they are still too new to have any accurate data.

TLDR version: don't worry, you'll toss your computer long before it wears out.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Sep 13, 2011 10:25:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

Sorry, Cleanenergy, but I'm decreasing my participation in CEP2 by 5-6 simultaneous tasks. The main reason is that I want to shut down machines more often and don't want to have to wait for CEP2 WUs to checkpoint in order to avoid wasting hours of crunching time. I hope that more frequent checkpointing comes soon.

The 12h cutoff behaviour is a factor on 1 machine, which discarded 3h23m and 4h15 of crunching on the last subjobs of cut off CEP2 WUs in the last day.

Then there is the problem that running a high proportion of CEP2 WUs on multi-core machines causes the whole machine to slow down. I micromanage sometimes to limit CEP2 to about 25% of real/virtual cores, but it's a PITA.

I look forward to an announcement that some of these issues have been solved.

[OT]@Jack007: "... so I could run CEP2 on a 10 gig RAMDISK. When I fix my heat issues I plan to run 4 CEP2 at a time to start, possibly increasing to 6 or 8 as I see how they do."
The RAMdisk may not solve all of the problems. It should avoid excessive HDD activity, but some of that activity might be delayed writes and may not be directly slowing down the WUs. CEP2 does lots of disc I/O and also generates lots of page faults, so it probably uses lots of CPU time in the OS, which is spent on functions that are probably single-threaded; when multiple WUs use these functions, there is probably a bottleneck in CPU time rather than just in HDD access time.

@andzgrid: When I restrict CEP2 tasks to 25% of the cores, I see very little activity in the HDD LED, which indicates that the OS disc cache is doing its work. No extra stress on the HDD this way.
[/OT]
CEP2's Q-Chemm software may be very sophisticated from the viewpoint of its science, but it does not seem to be a well-behaved computer system citizen in a multi-core/multiprocessor environment. It was probably written in FORTRAN by chemists/physicists/engineers who are gurus in their sciences, but not in Computer Science.
- Rick
Project Name/Points/Results/Time:
The Clean Energy Project - Phase 2 / 5,092,668 / 5,320 / 3:058:10:16:53
[Sep 17, 2011 7:12:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 12h CPU Time Cutoff is Wasting 30% of CPU Time

CEP2-related wear is not something I'm worried about.

Me, I've had hardware failures on systems running CEP2 - 'cuz I accidentally cocked the ATX power cable to a mobo and it eventually vibrated loose enough to arc. (Motherboard, power supply, and hard and SSD drives survived...although I did have to chisel some burnt plastic out of the way to reconnect everything). But that had nothing to do with the workload CEP2 places on a system!

Then, I lost an SSD out of a RAID stripe set on that same system running CEP2 - because when I was taking care of my blunder with the power cable, I forgot to plug a data cable back into one of that RAID set's disks...then when I plugged it back in, I took the lazy way out and didn't pull the unit off the rack...causing me to unintentionally cock the data cable to the other drive. When it vibrated loose, the RAID set dropped out.

But again, nothing to do with CEP2!!

In summary, I've not had any hardware failures attributable to CEP2, and I always run at least 7 tasks on hyperthreading 4-core CPUs, 11 tasks on hyperthreading 6-core CPUs, and all cores on non-hyperthreading CPUs (i.e., if the thing has 4 cores, I run 4 tasks). When ambient temperatures drop in the winter, I'll go back to running all cores, all threads on all CPUs.

I.e., I worry far more about heat than mechanical wear; watch your system temps with something like CoreTemp, and if you see Tj temperatures hitting 65 or 70 degrees Celsius, either increase/make cooling more efficient or reduce the number of tasks that you're running.

I'm motivated , you see. In light of the fact that every single [insert unit of currency] spent on energy by a scientific research effort is a [insert unit of currency] taken away from that research effort, I personally would call the search for cheap and freely available energy the single most important existing scientific research effort on the face of the planet - perhaps even the single most important task all of humanity currently faces. Particularly since individual humans spend money on energy that they could be spending on medicine, food, shelter, education, etc. etc. etc. Not to mention that the current energy sources are each vulnerable to the random extortion of the incredibly variable and absolutely uncontrollable private taxes called "profits" by any or all of the energy harvesters, the energy distributors, the energy converters, and - worst of all - the speculators. Not to mention entire economies and populations have already been negatively affected by precisely such extortion.

So I'll carefully (lolll...sorta...obviously) watch how my hardware behaves while it quite happily and obliviously crunches CEP2 units.
[Sep 18, 2011 12:12:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread