Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: DDD2 Type B work units going out. |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 369
|
Author |
|
PecosRiverM
Veteran Cruncher The Great State of Texas Joined: Apr 27, 2007 Post Count: 1053 Status: Offline Project Badges: |
Looks like they went really fast too.
---------------------------------------- |
||
|
I need a bath
Senior Cruncher USA Joined: Apr 12, 2007 Post Count: 347 Status: Offline Project Badges: |
Gosh, I hope this beta goes well. I really am looking forward to DDD2.
---------------------------------------- |
||
|
GIBA
Ace Cruncher Joined: Apr 25, 2005 Post Count: 5374 Status: Offline |
Got some of this new ones updated and released.
----------------------------------------Hope that all goes well in a smooth way, despite be a Beta test...
Cheers ! GIB@
Join BRASIL - BRAZIL@GRID team and be very happy ! http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1 |
||
|
HutchNYC
Advanced Cruncher United States Joined: Nov 27, 2005 Post Count: 97 Status: Offline Project Badges: |
I have a few questions about my current settings as it applies to these beta's.
----------------------------------------The pc I'm currently using is an i7-920 with 4GB running Vista-64bit. I run bonic 24/7 with WCG as the only project. This machine has been running this way for almost a year now. I usually can browse the internet, check mail, work on spreadsheets, etc. while crunching and have never had any noticeable lag or slowdown while doing this. When I'm running these DDD2 Beta's though, there is a VERY significant slowdown. The HD also seems to be in a perpetual read/write state. I'm not sure if this is due to the pagefile/VM activity because of the higher memory requirements, or because of my checkpointing settings. (Most likely both) I had always ran with the checkpoint at most every 10 seconds setting. This is no doubt quicker than I really need it to be as the pc stays on 24/7, but it had never been an issue before. I bumped the checkpoint up to 120 seconds, but I don't see any improvement in system performance. This isn't a big deal while a few beta's are running, but if users have DDD2 selected in their project mix I can see the potential problem of complaints that "WCG is causing my system to crawl". Any ideas/thoughts/suggestions on the best settings for us to use once this project goes into regular production? I'm not complaining. Just hoping you might have some suggestions that might make the lag less noticeable. Thanks, Hutch
Semper Fi Click here to view or join team USMC
|
||
|
smeyer55
Senior Cruncher Joined: Feb 15, 2009 Post Count: 303 Status: Offline Project Badges: |
I got some on my I7-920. It looks like they're going to take about 3 hours to run.
I'm also seeing disk activity every few seconds with 8 betas running at the same time. I don't notice any system slowdown though. steve |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2977 Status: Offline Project Badges: |
Good news, :) Credit should have been given to 672 results. These were the work units for the type B.se that were cancelled a few hours ago. Thanks, -Uplinger Thanks for that Uplinger - I know you didn't have to do it (after all, situations like this are all part of the "fun" of Beta testing), although it's much appreciated |
||
|
MrWizard
Cruncher Joined: Nov 16, 2004 Post Count: 4 Status: Offline Project Badges: |
There is too much disk activity by this program. You need to buffer your writes to solv*.trj and solv*.rst and not write to them constantly. There is no way I would let a production application run like this on my computers.
|
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
I had always ran with the checkpoint at most every 10 seconds setting. This is no doubt quicker than I really need it to be as the pc stays on 24/7, but it had never been an issue before. I bumped the checkpoint up to 120 seconds, but I don't see any improvement in system performance. This isn't a big deal while a few beta's are running, but if users have DDD2 selected in their project mix I can see the potential problem of complaints that "WCG is causing my system to crawl". Any ideas/thoughts/suggestions on the best settings for us to use once this project goes into regular production? An already started task won't get the new "write to disk"-setting before it's exited (removed from memory), so you'll likely still checkpointing once every 10 seconds. Since each task checkpoints independently, this basically means you'll checkpointing once per second, and if you're not running a fairly new BOINC-client (v6.10.xx), this will trigger a re-write of client_state.xml, and this can kill performance if you've got many task (actually quite few for some of the WCG-sub-projects that uses lots of files per task...) With v6.10.xx-clients the checkpointing goes to individual file per task, so there won't be a BOINC-overhead for frequent checkpointing... The application on the other hand can still write large files if it's programmed that way, and this can slow-down performance. In any case, there's generally no reason to use less than 60 seconds for checkpoint-interwall, and if you continue having problems, try bumping-up to 10 minutes or something... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
I had always ran with the checkpoint at most every 10 seconds setting. This is no doubt quicker than I really need it to be as the pc stays on 24/7, but it had never been an issue before. I bumped the checkpoint up to 120 seconds, but I don't see any improvement in system performance. HutchNYC 1. Changing the interval takes effect upon restarting the client or on the next job. The Job once loaded into memory retains the initial setting of 10 second. 2. There's been a back and forth with the developers on writing system wide checkpoints i.e. in your case once per 10 seconds for the whole client and once per 10 seconds per job i.e. as outlined for 8 concurrent jobs that being, if there is a checkpoint to write, a frequency of one about every 1.25 seconds. The default 60 second has a point. No one bothers too much about loosing a minute on system restart, if there is a checkpoint written every minute or less. I've got it on 5 minutes, to keep that disk whirring down. In your case with 8 cores that'd be a loss on average of 2.5 minutes per job IF the science is programmed and able to checkpoint frequently. You would not want to do that with large checkpoint files. As for the overall project concern... there will not be too many B types per target. The bulk of the project is C types and as I understand it there will be a mix of ABC as the batches cycle through. Edit: Had 4 running concurrently of the pe type... did not notice with 2.5 gb ram allowed for use by BOINC and LAIM on. edit: with a 5 minute write setting MY client log looks like this (Ingleside does not like checkpoint logging... else he'd not be able to see the real problems ;-) 28/01/2010 07:32:43 World Community Grid [checkpoint_debug] result CMD2_0315-MYH14.clustersOccur-2IAE_B.clustersOccur_193_0 checkpointed 28/01/2010 07:33:38 World Community Grid [checkpoint_debug] result BETA_erlc_a218_pe0000_2 checkpointed 28/01/2010 07:33:45 World Community Grid [checkpoint_debug] result BETA_erlc_a189_pe0000_0 checkpointed 28/01/2010 07:37:21 World Community Grid [checkpoint_debug] result BETA_erlc_a215_pe0000_2 checkpointed 28/01/2010 07:38:05 World Community Grid [checkpoint_debug] result CMD2_0315-MYH14.clustersOccur-2IAE_B.clustersOccur_193_0 checkpointed 28/01/2010 07:38:52 World Community Grid [checkpoint_debug] result BETA_erlc_a189_pe0000_0 checkpointed 28/01/2010 07:38:54 World Community Grid [checkpoint_debug] result BETA_erlc_a218_pe0000_2 checkpointed 28/01/2010 07:42:37 World Community Grid [checkpoint_debug] result BETA_erlc_a215_pe0000_2 checkpointed 28/01/2010 07:43:08 World Community Grid [checkpoint_debug] result CMD2_0315-MYH14.clustersOccur-2IAE_B.clustersOccur_193_0 checkpointed 28/01/2010 07:43:56 World Community Grid [checkpoint_debug] result BETA_erlc_a189_pe0000_0 checkpointed 28/01/2010 07:43:59 World Community Grid [checkpoint_debug] result BETA_erlc_a218_pe0000_2 checkpointed 28/01/2010 07:47:44 World Community Grid [checkpoint_debug] result BETA_erlc_a215_pe0000_2 checkpointed 28/01/2010 07:48:17 World Community Grid [checkpoint_debug] result CMD2_0315-MYH14.clustersOccur-2IAE_B.clustersOccur_193_0 checkpointed 28/01/2010 07:48:58 World Community Grid [checkpoint_debug] result BETA_erlc_a189_pe0000_0 checkpointed 28/01/2010 07:49:13 World Community Grid [checkpoint_debug] result BETA_erlc_a218_pe0000_2 checkpointed 28/01/2010 07:52:45 World Community Grid [checkpoint_debug] result BETA_erlc_a215_pe0000_2 checkpointed 28/01/2010 07:53:22 World Community Grid [checkpoint_debug] result CMD2_0315-MYH14.clustersOccur-2IAE_B.clustersOccur_193_0 checkpointed 28/01/2010 07:53:59 World Community Grid [checkpoint_debug] result BETA_erlc_a189_pe0000_0 checkpointed 28/01/2010 07:54:24 World Community Grid [checkpoint_debug] result BETA_erlc_a218_pe0000_2 checkpointed
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Jan 28, 2010 7:00:42 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Can someone tell me, what this means:
BETA_ erlc_ a029_ se0000_ 1-- 614 Server Aborted 27.01.10 16:27:02 27.01.10 21:51:31 0.00 0.0 / 0.0 BETA_ erlc_ a029_ se0000_ 2-- 614 Server Aborted 27.01.10 16:27:02 27.01.10 23:59:27 0.00 0.0 / 0.0 BETA_ erlc_ a029_ se0000_ 0-- 614 Too Late 27.01.10 16:26:59 27.01.10 22:32:00 3.76 45.3 / 45.3 ยด<---- MINE My result was returned 6 hours after sending out, and that is TOO LATE...??? |
||
|
|