Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 38
Posts: 38   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 311362 times and has 37 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

Thanks skgiven for your response.

I did not have start_delay in the config file, though it's there now. Since as I read it, it affects only client startup, it would not matter afterward?. With more VM I forced a CEP2 task to start by suspending another, and it took about 20 seconds and again it caught another task (C4CW) in the task exited, may need to restart message -- but this time only one of the other two running.

This host is limited to one CEP2 task. In practice it allows two because XP is using 3 cores for BOINC and I am running andLinux on the 4th, and it is running CEP2 also. This behavior however happens even if andLinux is shut down.

Haven't time to pursue this systematically right now but will come back to it in a couple weeks.
[Jan 19, 2011 8:00:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

I think this may (not pointing fingers yet) have been hosing my desktop but I'm not 100% sure. I was frequently (at least every other day) finding my computer had restarted itself, and event viewer was displaying the dreaded "the previous shutdown was unexpected" error (event ID 6008). Ran a bunch of diagnostics (memory, CPU, disk, etc) all which came up fine. After the last "unexpected restart", I was reading the forums and noticed this thread. The timing of the last shutdown corresponded to the startup of a CEP2 task (I do also have the limit of 1 CEP2 task/host). I temporarily removed CEP2 from my project list and it hasn't happened since. I'm going to experiment a bit more by flipping it back on over some weekend and seeing what happens.

Intel Q6600 (stock speed), 8GB RAM, Vista Home Premium x64, BOINC 6.10.58
----------------------------------------
[Edit 2 times, last edit by Former Member at Jan 25, 2011 9:27:52 PM]
[Jan 25, 2011 9:22:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

OK I'm back on this again, sporadically available myself but the behavior is consistent whenever I check. I still see the 2 other cores' tasks restarting whenever I start a CEP task.

This time, I wanted to force a CEP task to run. I did this:
suspended all non-running tasks except for CEP.
suspended computing altogether
suspended the tasks that had been running (both DDDT-2).
resumed computing
waited for CEP to start.
Once it was running, I resumed the other tasks.
They still instantly did the "exited with no finish file" thing and restarted.
This happens only when a CEP job starts.
Page file is nowhere close to being full now so I'm clueless what's doing this. The host becomes molasses for a good minute after a CEP task starts, until a while after it actually shows as Running.
This is a Q9450 on XP Pro, boinc 6.10.58. LAIM is on.
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 4, 2011 6:25:42 AM]
[Mar 4, 2011 6:24:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
LCB001
Advanced Cruncher
CANADA
Joined: Oct 14, 2009
Post Count: 69
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

OK I'm back on this again, sporadically available myself but the behavior is consistent whenever I check. I still see the 2 other cores' tasks restarting whenever I start a CEP task.

This time, I wanted to force a CEP task to run. I did this:
suspended all non-running tasks except for CEP.
suspended computing altogether
suspended the tasks that had been running (both DDDT-2).
resumed computing
waited for CEP to start.
Once it was running, I resumed the other tasks.
They still instantly did the "exited with no finish file" thing and restarted.
This happens only when a CEP job starts.
Page file is nowhere close to being full now so I'm clueless what's doing this. The host becomes molasses for a good minute after a CEP task starts, until a while after it actually shows as Running.
This is a Q9450 on XP Pro, boinc 6.10.58. LAIM is on.

I also have a Q9450 (@3.4 GHz, 4GB Ram) and it has no problem running CEP2 and has run up to four of them with no issues.

How much Ram does your system have, it sounds like you have a bottleneck somewhere.

Maybe too many processes running in the background.

To force a task to start just suspend all non-running tasks and one of the running ones, no need to suspend computation on everything...
----------------------------------------

[Mar 4, 2011 7:53:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

[Still] Can't say to observe this bogarding on neither my Windows 7 - 32 duo nor Linux 64 quad. LAIM is on of course as per the tech's advise. This is with even DDDT2 C Types on the side. Per alpha client 6.12.14 these babies grab 750MB ram each, same as CEP2... heavy models I'd say. See http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,26026#315424

One thing not understood is, that if BOINC Tasks total memory use is exceeding the setting idle/work, it would pause one or more tasks and go into "waiting for memory". Not having observed this, is the swap file big enough and the minimum size set? Mine is minimum 1.5 times RAM and unlimited growth. One would expect for the full automatic managing of VM to be quick enough to expand to meet ad hoc requirements, but who knows. At least, my VMs are in a separate partition for best performance and avoid fragmentation (Windows issue, not a Linux issue).

And yes, the CEP2 jobs start with building the model taking time. Here on Linux it takes about 15-17 seconds, on the duo Centrino near a minute (1.5Gb RAM allowed), but since I've used ThreadmasterGUI to only restrict the process to 90% per core (45 is the setting for a duo to get 90), it barely noticeble. The true hitter is when the 3rd checkpoint is stored. Though not noticing when, it does take sometimes 10 minutes before the 4th (job #3) is under way. That 10 minutes for reference is out of the 15-17 minutes I see on the Linux box for the whole task as Elapsed-CPU time differential.

--//--

NB: PageDefrag used to work on XP and Vista, but refuses to run on W7, which is when I started putting the VM in it's own partition. BOINC data_dir is also in it's own btw.
[Mar 4, 2011 9:38:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

For CEP2 there is a lot of disk I/O during initiliazation so any issues observed could be because of a storage issue. I would recomend running hardware test on the hard drive to rule out any issues with it.

Thanks,
armstrdj
[Mar 4, 2011 2:35:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

The only reason I suspended computing altogether was to suspend the DDDT2 running tasks before the CEP2 tried to start; was hoping to keep the DDDT-2 from exiting at that time. 4 GB on the host, with XP that means about 3.25 usable. Page file is huge and the raid 1 is healthy. Bumped up the memory and page file settings for boinc again and will see. Thanks

No change. How are the DDDT2 tasks exiting when they're suspended?
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 4, 2011 7:31:44 PM]
[Mar 4, 2011 7:27:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

They're not exiting long as you set the recommended LAIM for CEP2 and when you set the LAIM that leaves the DDDT2 in situ whereas if not, that again creates disk activity.

And setting VM too Huge is not helping either. VM is best set to 1.5 of RAM as minimum and in contiguous space, which reserved partitions guarantee. If VM and BOINC data dir are on physical different drives that again improves performance... old hat stuff.

--//--
[Mar 4, 2011 8:09:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

Well the tasks are exiting, according to messages, and LAIM has been set the entire many years I've run BOINC. I assumed the "exited with no finish file" basically meant the wu crashed and had to be restarted. The whole rigmarole of suspending computing, then suspending everything but the CEP2 task to be started, then resuming computing after CEP2 was well started, was my attempt to keep the DDDT-2 taks from exiting, but it didn't work.

Even though all other tasks were suspended, when the CEP2 task started the 3 DDDT-2 tasks in memory all did the exited-with-zero-status-but-no-finished-file thing, before I even tried to resume them.

I should say that when I'm running CEP2 only the box can run 2-3 tasks, better with 2. It handles 3-4 DDDT-2 just fine, when running onlytha. It's just that any DDDT-2 tasks get crashed whenever a CEP2 task starts. And CEP2 takes a good 30-60 seconds to show as running in the client. Otherwise I've seen no perf issues with this system.
I can tweak the VM settings more, but larger did seem to help. What I meant last message was that I bumped the % BOINC is allowed to use of the PF, to no avail.
No other drive to use. The raid 1 probably doesn't help, performance wise, but I was after recovery reliability more than performance in that case.
Appreciate everyone's thoughts. Still scratching my head.
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 5, 2011 1:07:48 AM]
[Mar 4, 2011 9:04:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Will CEP2 units eventually NOT Bogart the host when they start up?

@[B.S] sTrey: Your problem with the computer freezing when CEP2 starts up sounds similar to the problems I have with 1 of my 5 computers. In my case the CEP2 WU starts OK, but at some point afterwards it stops processing and the machine freezes for up to 2-3 minutes with the HDD activity LED on. Often, the WUs exit with zero status, and have to restart from their last checkpoints.
I also found this behaviour when I was running DDDT2 all 4 cores.
Nobody has fully explained it or exactly how to get around it.
Legrandpiou is experiencing it too - see CEP2 Forum thread work units not finishing . I made a long post there about my own experiences on Mar 2 2011 (16th post). It has some links to other threads about this problem.
I think armstrdj is on the right track when he suggests it is a storage (hard drive) issue. In my case, the HDD manufacturer's diagnostics say that it is perfect. I think it is probably an HDD hardware design or onboard firmware issue. The size of the RAM buffer on the drive could also be a factor. My bad drive (WD SATA) has 32MB, and one of my other machines has a very similar drive with only 16MB and it is OK. If you have your BOINC data on a different physical drive to the system & pagefile, I think it is the drive with the BOINC data that is at fault.
In my other post, I suggested that we collect the details of the storage devices of computers that have these problems. It would probably be best to post your HDD details in the other thread, to keep our little database together.
I hope we can help each other to sort out the problem.
----------------------------------------
[Edit 2 times, last edit by Rickjb at Mar 5, 2011 10:13:40 AM]
[Mar 5, 2011 9:52:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 38   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread