| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 27
|
|
| Author |
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Calling Techs: Blue murder time!
----------------------------------------E000021_ 495A_ 00031w013_ 1-- Valid 12/12/08 08:23:27 12/13/08 13:28:37 7.62 38.9 / 38.9 < Yeah right if you're gullible. E000021_ 495A_ 00031w013_ 0-- Valid 12/12/08 08:23:21 12/13/08 09:01:10 13.96 143.2 / 38.9 < Moi How the credit logic for this project works is a complete riddle. Until resolved not going to waste further electricity at just running at 27% throughput. This is not the first credit joke seen on CEP and I do know for sure that the parallel jobs running HCC get 100% on the same CPU. good night and good luck.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Calling Techs: Blue murder time! E000021_ 495A_ 00031w013_ 1-- Valid 12/12/08 08:23:27 12/13/08 13:28:37 7.62 38.9 / 38.9 < Yeah right if you're gullible. E000021_ 495A_ 00031w013_ 0-- Valid 12/12/08 08:23:21 12/13/08 09:01:10 13.96 143.2 / 38.9 < Moi Seriously. That's a bummer. Looky here: E000018_ 291A_ 00025m00r_ 2-- Valid 12/11/08 22:37:15 12/12/08 22:40:42 16.98 326.7 / 326.7 <---Bum who killed 200 credits for me E000018_ 291A_ 00025m00r_ 1-- Error 12/11/08 06:34:03 12/11/08 22:34:24 1.92 23.0 / 0.0 E000018_ 291A_ 00025m00r_ 0-- Valid 12/11/08 06:29:41 12/13/08 02:02:36 30.82 563.9 / 326.7 <---Me Other than this I have been getting pretty fair results so far. And the machine listed above has consistently been claiming under granted credit. It usually gets more than what it requests on this project. Oh well. At least it's valid. |
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
[Edit: See Current CEP bugs / problems ]
----------------------------------------It might be interesting to compare CEP throughput under 64-bit Windows or Linux vs that under the 32-bit versions. Kernel activity, including the disc cache, should run faster in 64-bit, while the 32-bit user code should run at the same speed. A big difference in throughput would suggest that the page-faulting is slowing down CEP. FWIW, I recently performed this test for FAAH, HCC and HPF2 by running the same single WU from each of these projects under Windows XP-32 and XP-64. I found XP-64 to be about 1% faster in each case. [Edit 1 times, last edit by Rickjb at Dec 18, 2008 5:08:06 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes, CHARMM is written in FORTRAN. Good detective work, there (although we have mentioned it here before). AfricanClimate@Home was FORTRAN, too.
These are soft page faults, not hard faults. This means the memory manager is able to satisfy the request with a page already in physical memory. It has nothing to do with the disk cache. From my sketchy understanding, the entire memory model used by these ancient behemoth applications is different to the kind I'm familiar with today. Designed to run on supercomputers, they allocate huge amounts of memory, then use it as they need. You will observe the huge VM size compared to the (relatively) small working set. The irony here is that the last project that had a page faulting problem was (if I recall correctly) caused by incautious use of C++ new and delete. |
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
So the page faults aren't happening anywhere near Discworld. Thanks, Didactylos. I wasn't sure about the nature of the page faults, but the fact remains that the high pf rate (delta) means that the o/s is getting a hammering.
----------------------------------------Designed to run on supercomputers, they allocate huge amounts of memory, then use it as they need. You will observe the huge VM size compared to the (relatively) small working set. I think that CHARMM is not only allocating the memory, but then releasing some of it, over and over. If you observe the startup of an FAAH WU, using Task Manager, you will see that it too generates many page faults as it grows its VM size to about the same 300MB, but then the page fault rate declines and the VM size stays constant. FAAH Autodock may be using user-mode heap management functions instead of the o/s.CHARMM's approach, which would be leaving as much system memory to other processes as possible, would be the correct one in a multitasking environment where memory was scarce and expensive, provided that it was releasing a big proportion of the memory that it had grabbed. However, the VM size is remaining fairly constant, so recycling memory internally might be a better scheme. [Added] Didactylos' next post sounds reasonable to me. I hope that we haven't lost tekennelly, the originator of the thread, with all of this tech talk, and that we've all gained some insights into what is happening. [Edit 2 times, last edit by Rickjb at Dec 15, 2008 3:11:54 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes, soft faults have a heavy kernel time cost.
But I think we have reached the end of useful speculation, and have nothing left but guesses. Normally I would ask the techs to give us some insight, but I think their time is better spent actually fixing it.... maybe they will have time when the dust has settled. Some of the BOINC API code suffers from early optimisation. It tries to be C, but there is enough C++ to make it very difficult to compile with a C compiler. So, when the developers ran into problems later with buffers that were too small, all too often the solution was to allocate a 64k buffer or something odd like that. I don't know whether the API code is to blame in this instance, but the potential is there. |
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
As I suggest in thread 2 WUs stopped early ..., I am wondering whether the high pf rate is responsible for the various errors that people are experiencing with CEP:
----------------------------------------Error after 9 hours of crunching, big difference in run times/credit claims, What are these errors about?. If you get an error or early termination of a CEP WU, I suggest you note: - Your operating system and its version - Whether it was on a single-CPU machine or a multi. - If a multi, would more than 1 CEP WU have been running at the time of the error? [Edit: Later observation by me suggests that early termination of CEP WUs does not seem to be related to running multiple CEPs simultaneously. See Current CEP bugs / problems ] [Edit 1 times, last edit by Rickjb at Dec 18, 2008 5:19:32 AM] |
||
|
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1408 Status: Offline Project Badges:
|
Just uploaded the 'shortest' of three long WU's. No errors, NORMAL STOP.
The other two are still running and a Beta 6.23 is Ready to Start on an XP32 1.6GHz after a CEP 6.19 ~57h has finished.E000026_ 191A_ 00035000v_ 1-- In Progress 13-12-08 17:13:24 20-12-08 17:13:24 0.00 0.0 / 0.0 E000026_ 191A_ 00035000v_ 0-- Pending Validation 13-12-08 17:10:30 15-12-08 18:51:31 34.95 491.0 / 0.0 <- me Some specs of this WU: Vista 64bit, Quad, 2.1GHz; half of the time running with another CEP on another core. Uploaded 5 files: 1234kB Peak WS 72,648; VM 338,212; Pagefaults 1,073,363,000 this is 7335 PF's/sec. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I suspect that some of what may be happening with the severe underclaimers is, that while running CEP clogging things up with the page faults, their benchmarks have run, producing lower than usual numbers. I've noticed a big difference before in the benchmark here while running HPF2, which seems to have very inefficient coding as well. Running HCC gives me great benchmarks though, if it has to pause and restart automatically.
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Probably it's choice of words, but I'd like to clarify that it's Undergranting (overclaiming) mostly, not underclaiming. When the "invalid" bug has been resolved, beta 6.23 underway, those should go away for the most part.
----------------------------------------Benchmarks have nothing at all to do with a particular task on hand. It's the efficiency of the task to work according the benchmark expectation as stored in the whetstone/dhrystone values which are updated every 5th day. Running CEP 1 at the time shows on my devices zero adverse effect with credits right close to the mark. Similarly I found HPF2 to run quicker, thus a better credit grant ratio, when run alone and not 4 parallel on a quad, but by no comparison to the CEP jobs. Now RICE is unique. It's runs each job a internal benchmark of 5 minutes through a reference mini wu. Most devices seem to do very good on this, at least I always seem to get 5 to 10% more than claimed.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
|