Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2503 times and has 10 replies Next Thread
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Errors when running too many CEP2 WUs at once?

I noticed an issue with one of my crunchers a few weeks back where it was spitting out basically nothing but errors. After some work, I determined that all of the WUs in progress threw errors when I was running "too many" CEP2 WUs on the system at once (more than about ten or twelve).

I know that running this many WUs isn't optimal for performance (due to IO bottlenecks), so that issue has already been addressed. However, now that I'm confident that it was an issue arising from CEP2, I'm curious why it was happening. Do you guys have any insights?

BTW, the system in question is:
- 4x AMD Opteron 8350 (quad core, basically a Phenom I X4)
- 4GB (8x512MB) RAM
- Tyan Quad Socket F board (I forget the exact model)
- Hitachi 250GB HDD
- Antec 550w PSU
- Radeon X1300

All running Windows Server 2008 R2 Enterprise.
----------------------------------------

[Apr 30, 2013 10:20:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

Memory requirement for CEP is about 1 GB PER wu, suspect you are running into a low memory problem
[Apr 30, 2013 11:13:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

Memory requirement for CEP is about 1 GB PER wu, suspect you are running into a low memory problem

A quick check on my laptop would indicate otherwise:

Here I'm seeing just about a hundred megabytes/WU (and yes, I realize that I have quite a few of them running at once, but this system has a SSD, so I expect the IO performance to be much better)

Memory usage on the AMD 4P seems pretty much constant at 2-2.5GB regardless of what WCG WUs are running.
----------------------------------------

[Apr 30, 2013 11:37:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

Hi Aperture_Science_Innovators,
Speaking of memory, how large is your virtual memory? Task Manager just shows the working set, which is usually a tiny portion of the total virtual memory consumed.

Lawrence
[May 1, 2013 1:21:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

Hi Aperture_Science_Innovators,
Speaking of memory, how large is your virtual memory? Task Manager just shows the working set, which is usually a tiny portion of the total virtual memory consumed.

Lawrence

I just checked--on the i7 laptop, 6GB, on the AMD 4P, 12GB.
----------------------------------------

[May 1, 2013 3:29:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

Hi Aperture_Science_Innovators,
Another good idea shot down! 12 GB virtual memory is about right for the load that Task Manager showed. I suppose that you have Windows set to automatically increase the VM Cache size if required. Which is how you should have it set.

Right now I am out of suggestions.

Lawrence
[May 1, 2013 4:46:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

I'm running 4 GFAM/DSFL and the performance tab of the TM says that 3.54GB of physical RAM is committed [overall]. See Firefox gobbling 654MB, what else if you sort the processes to size? What is confusing, to me is, that the screenshot implies there are 6 CEP2 jobs running. That's 'testing', squeezing this into 4GB [WCG specs 1GB per job], meaning very high I/O from RAM to VM [on disk] and storage of the checkpoints too.
[May 1, 2013 5:03:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

I think you have enough physical memory to run more than 12 CEP2 jobs, though if you're running WCG projects on all 16 cores there won't be much to spare. But I think your problem is something other than amount of memory:

1. When a CEP2 WU starts, it immediately unzips several thousand (!) small data files (refrains from comments re FORTRAN programmers and their mastery of sound computer science principles). This causes a lot of overhead for the O/S, as well as the storage device hardware. If a number of CEP2 WUs try to start up simultaneously, the system can freeze for many seconds, possibly because file creation is probably a single-threaded process, and creating all those files just overwhelms it. In that situation, other BOINC tasks can experience timeouts, causing them to exit. "No heartbeat from client" is a common error message.
I haven't looked at how much O/S overhead there is in accessing all of these files once the main part of the CEP2 calculations are underway, but with so many files it must be considerable.

2. Page faults: I suggest you have a look at this parameter in your task manager (View >> Select columns). CEP2 produces heaps of these, and I suspect that they too are single-threaded in the O/S.

Hope that helps.

BTW, the CEP2 science team are working on upgrades to the program. No details are known, but fingers crossed they may be addressing some of these problems.
[May 2, 2013 2:58:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aperture_Science_Innovators
Advanced Cruncher
United States
Joined: Jul 6, 2009
Post Count: 139
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

I'm running 4 GFAM/DSFL and the performance tab of the TM says that 3.54GB of physical RAM is committed [overall]. See Firefox gobbling 654MB, what else if you sort the processes to size? What is confusing, to me is, that the screenshot implies there are 6 CEP2 jobs running. That's 'testing', squeezing this into 4GB [WCG specs 1GB per job], meaning very high I/O from RAM to VM [on disk] and storage of the checkpoints too.

I just checked again on my laptop--Firefox is currently using 1.1GB, Adobe Flash Player and Reader are both at about 120MB, and there are eight WCG tasks each using 75-100mb or so. But there are also about 90 other processes running, so it does add up.

The laptop is running on a SSD, so I assume that the additional disk throughput is sufficient that 6 CEP2 WUs at once are OK? CEP2 is the project I'm most interested in, so I try to run it as much as I think my hardware is capable of.
Hi Aperture_Science_Innovators,
Another good idea shot down! 12 GB virtual memory is about right for the load that Task Manager showed. I suppose that you have Windows set to automatically increase the VM Cache size if required. Which is how you should have it set.

Right now I am out of suggestions.

Lawrence

I would have thought that 4GB + 12GB should be sufficient. I actually just checked now, and the AMD 4P is running on a 5400RPM laptop drive (I forgot that I had set it up like that). Is it possible that the speed of the disk was negatively impacting things?

Thanks guys!
----------------------------------------

[May 2, 2013 4:13:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Errors when running too many CEP2 WUs at once?

Refer to what Rickjb and Lawrence already wrote... yes, the IO to a 5400 RPM spinning drive will add subtract from the success. Staggered starting of multiple CEP2 tasks is recommended, unfortunately hard to automate [nothing in BOINC to control this], but once they're all running past the first checkpoint, chance of 2 or more finish/start simultaneous will be minimal. My 8 core has 8GB DDR3 and becomes inefficient, particularly when used, but it 'can' do 8 in hands off mode.
[May 2, 2013 4:55:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread