World Community Grid - View Thread - CEP is experiencing a huge PF Delta rate

World Community Grid Forums

Category: Completed Research

Forum: The Clean Energy Project Forum

Thread: CEP is experiencing a huge PF Delta rate

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 27

[ ]

Author

This topic has been viewed 6190 times and has 26 replies

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: CEP is experiencing a huge PF Delta rate

Calling Techs: Blue murder time!

E000021_ 495A_ 00031w013_ 1-- Valid 12/12/08 08:23:27 12/13/08 13:28:37 7.62 38.9 / 38.9 < Yeah right if you're gullible.
E000021_ 495A_ 00031w013_ 0-- Valid 12/12/08 08:23:21 12/13/08 09:01:10 13.96 143.2 / 38.9 < Moi

How the credit logic for this project works is a complete riddle. Until resolved not going to waste further electricity at just running at 27% throughput.

This is not the first credit joke seen on CEP and I do know for sure that the parallel jobs running HCC get 100% on the same CPU.

good night and good luck.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[Dec 14, 2008 12:21:36 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: CEP is experiencing a huge PF Delta rate

Seriously. That's a bummer.
Looky here:
E000018_ 291A_ 00025m00r_ 2-- Valid 12/11/08 22:37:15 12/12/08 22:40:42 16.98 326.7 / 326.7 <---Bum who killed 200 credits for me crying

E000018_ 291A_ 00025m00r_ 1-- Error 12/11/08 06:34:03 12/11/08 22:34:24 1.92 23.0 / 0.0
E000018_ 291A_ 00025m00r_ 0-- Valid 12/11/08 06:29:41 12/13/08 02:02:36 30.82 563.9 / 326.7 <---Me
Other than this I have been getting pretty fair results so far. And the machine listed above has consistently been claiming under granted credit. It usually gets more than what it requests on this project. Oh well. At least it's valid.

[Dec 14, 2008 12:33:31 AM]

Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

1 year badge for Discovering Dengue Drugs - Together

45 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for Discovering Dengue Drugs - Together - Phase 2

10 year badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

10 year badge for GO Fight Against Malaria

20 year badge for Mapping Cancer Markers

1 year badge for Uncovering Genome Mysteries

50 year badge for Outsmart Ebola Together

50 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

2 year badge for Microbiome Immunity Project

45 day badge for Africa Rainfall Project

50 year badge for OpenPandemics - COVID-19


Re: CEP is experiencing a huge PF Delta rate

[Edit: See Current CEP bugs / problems ]

It might be interesting to compare CEP throughput under 64-bit Windows or Linux vs that under the 32-bit versions. Kernel activity, including the disc cache, should run faster in 64-bit, while the 32-bit user code should run at the same speed. A big difference in throughput would suggest that the page-faulting is slowing down CEP.
FWIW, I recently performed this test for FAAH, HCC and HPF2 by running the same single WU from each of these projects under Windows XP-32 and XP-64. I found XP-64 to be about 1% faster in each case.

----------------------------------------
[Edit 1 times, last edit by Rickjb at Dec 18, 2008 5:08:06 AM]

[Dec 14, 2008 8:17:24 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: CEP is experiencing a huge PF Delta rate

Yes, CHARMM is written in FORTRAN. Good detective work, there (although we have mentioned it here before). AfricanClimate@Home was FORTRAN, too.

These are soft page faults, not hard faults. This means the memory manager is able to satisfy the request with a page already in physical memory. It has nothing to do with the disk cache.

From my sketchy understanding, the entire memory model used by these ancient behemoth applications is different to the kind I'm familiar with today. Designed to run on supercomputers, they allocate huge amounts of memory, then use it as they need. You will observe the huge VM size compared to the (relatively) small working set.

The irony here is that the last project that had a page faulting problem was (if I recall correctly) caused by incautious use of C++ new and delete.

[Dec 14, 2008 9:45:15 AM]

Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:


Re: CEP is experiencing a huge PF Delta rate

So the page faults aren't happening anywhere near Discworld. Thanks, Didactylos. I wasn't sure about the nature of the page faults, but the fact remains that the high pf rate (delta) means that the o/s is getting a hammering.

Designed to run on supercomputers, they allocate huge amounts of memory, then use it as they need. You will observe the huge VM size compared to the (relatively) small working set.

I think that CHARMM is not only allocating the memory, but then releasing some of it, over and over. If you observe the startup of an FAAH WU, using Task Manager, you will see that it too generates many page faults as it grows its VM size to about the same 300MB, but then the page fault rate declines and the VM size stays constant. FAAH Autodock may be using user-mode heap management functions instead of the o/s.
CHARMM's approach, which would be leaving as much system memory to other processes as possible, would be the correct one in a multitasking environment where memory was scarce and expensive, provided that it was releasing a big proportion of the memory that it had grabbed. However, the VM size is remaining fairly constant, so recycling memory internally might be a better scheme.

[Added] Didactylos' next post sounds reasonable to me. I hope that we haven't lost tekennelly, the originator of the thread, with all of this tech talk, and that we've all gained some insights into what is happening.

----------------------------------------
[Edit 2 times, last edit by Rickjb at Dec 15, 2008 3:11:54 AM]

[Dec 14, 2008 8:55:27 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: CEP is experiencing a huge PF Delta rate

Yes, soft faults have a heavy kernel time cost.

But I think we have reached the end of useful speculation, and have nothing left but guesses.

Normally I would ask the techs to give us some insight, but I think their time is better spent actually fixing it.... maybe they will have time when the dust has settled.

Some of the BOINC API code suffers from early optimisation. It tries to be C, but there is enough C++ to make it very difficult to compile with a C compiler. So, when the developers ran into problems later with buffers that were too small, all too often the solution was to allocate a 64k buffer or something odd like that. I don't know whether the API code is to blame in this instance, but the potential is there.

[Dec 15, 2008 12:30:18 AM]

Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:


Re: CEP is experiencing a huge PF Delta rate

As I suggest in thread 2 WUs stopped early ..., I am wondering whether the high pf rate is responsible for the various errors that people are experiencing with CEP:
Error after 9 hours of crunching, big difference in run times/credit claims, What are these errors about?.
If you get an error or early termination of a CEP WU, I suggest you note:
- Your operating system and its version
- Whether it was on a single-CPU machine or a multi.
- If a multi, would more than 1 CEP WU have been running at the time of the error?
[Edit: Later observation by me suggests that early termination of CEP WUs does not seem to be related to running multiple CEPs simultaneously.
See Current CEP bugs / problems ]

----------------------------------------
[Edit 1 times, last edit by Rickjb at Dec 18, 2008 5:19:32 AM]

[Dec 15, 2008 6:11:08 AM]

Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1408
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

90 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

2 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

5 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project


Re: CEP is experiencing a huge PF Delta rate

Just uploaded the 'shortest' of three long WU's. No errors, NORMAL STOP.
The other two are still running and a Beta 6.23 is Ready to Start on an XP32 1.6GHz d oh

after a CEP 6.19 ~57h has finished.

E000026_ 191A_ 00035000v_ 1-- In Progress 13-12-08 17:13:24 20-12-08 17:13:24 0.00 0.0 / 0.0
E000026_ 191A_ 00035000v_ 0-- Pending Validation 13-12-08 17:10:30 15-12-08 18:51:31 34.95 491.0 / 0.0 <- me

Some specs of this WU: Vista 64bit, Quad, 2.1GHz; half of the time running with another CEP on another core.
Uploaded 5 files: 1234kB
Peak WS 72,648; VM 338,212; Pagefaults 1,073,363,000 this is 7335 PF's/sec.

[Dec 15, 2008 7:53:50 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: CEP is experiencing a huge PF Delta rate

I suspect that some of what may be happening with the severe underclaimers is, that while running CEP clogging things up with the page faults, their benchmarks have run, producing lower than usual numbers. I've noticed a big difference before in the benchmark here while running HPF2, which seems to have very inefficient coding as well. Running HCC gives me great benchmarks though, if it has to pause and restart automatically.

[Dec 16, 2008 11:08:11 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: CEP is experiencing a huge PF Delta rate

Probably it's choice of words, but I'd like to clarify that it's Undergranting (overclaiming) mostly, not underclaiming. When the "invalid" bug has been resolved, beta 6.23 underway, those should go away for the most part.

Benchmarks have nothing at all to do with a particular task on hand. It's the efficiency of the task to work according the benchmark expectation as stored in the whetstone/dhrystone values which are updated every 5th day.

Running CEP 1 at the time shows on my devices zero adverse effect with credits right close to the mark. Similarly I found HPF2 to run quicker, thus a better credit grant ratio, when run alone and not 4 parallel on a quad, but by no comparison to the CEP jobs.

Now RICE is unique. It's runs each job a internal benchmark of 5 minutes through a reference mini wu. Most devices seem to do very good on this, at least I always seem to get 5 to 10% more than claimed.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[Dec 16, 2008 11:28:45 AM]

[ ]