| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 36
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You might be onto something... 8 concurrently starting CEP2 [after e.g. boot], could bring a classic HD substantially to it's knees... show the system to be in a frozen state, whilst actually the HD I/O is taking all the CPU load. If the BOINC datadir is also on the SD, maybe a different story. At any rate, if the techs are reading, there's still that open dev ticket raised with Berkeley to have staggered starting for heavy apps. Must be past it's 3rd anniversary now.
|
||
|
|
Wolf Fivousix
Cruncher Joined: Mar 22, 2013 Post Count: 10 Status: Offline Project Badges:
|
Hello everyone, thank you for all the information, let me try to address every raised question/idea without making this too long of a post:
Mr. Kermit, Memtest have been running for 16h and no errors have been found so far (typing this from laptop). I don't use the BOINC screen saver, but the freezing may happen at any moment, be it reading the forum, watching a video, in screen saver, or even with the display "off", like once it has been in screen saver for too long it just displays a black screen. Shortage of space on the HD is something I never really considered as I don't like to use more than 75% of the available space, but I'll try filling up the HD and see if that make things worse. I have never tried changing for a less demanding project as I have always focused on the CEP, that is a test that I will also be performing, as well as getting back to you with a frag state of the HD. Sqt. Joe, I have not yet tried reducing the core count on the Asus Sabertooth MB, as it showed no effect on the previous motherboards, but I will do it just to be sure and come back with the results. Scribe, I am very sorry for that, but there was no "hardware" subforum and I assumed this would be the place where I could get the most help from people that understand hardware. Seke Rob, how can I set windows to throttle BOINC? Everyting related to BOINC is on the HD, on the SSD is only windows and a couple of games that I play on a regular basis. Do you think re-installing BOINC on the SSD could make a difference? Katoda, I don't have any kind of special equipment, is there anyway (like a software) that I could test my PSU? Coleslaw, I have indeed always run CEP (and now CEP2), what is the workunit you talk of? The name of the tasks I have currently running? The application for all tasks have been "The Clean Energy Project - Phase 2 7.00" for a couple of months, I have no record of what were the ones I used to run previously, sorry =S. Once again, thank you very much everyone for the overwhelming positive response, since this is an intermittent problem I cannot guarantee that anything will work or not unless I find something defective (like my previous motherboard), but I WILL be trying all the suggestions one by one too see if I can, at least, reproduce the freezing problem. |
||
|
|
MrKermit
Advanced Cruncher Joined: Jun 13, 2009 Post Count: 95 Status: Offline Project Badges:
|
Try setting up a device profile in WCG from the Website. Go to
----------------------------------------"settings" [device manager] [device profiles] _Default_ O custom profile There are dozens of ways to play here Tuning the amount of CPU and Ram etc. Since the machine is crashing while you use it I would consider changing: Leave applications in memory while suspended But that isn't recommended for CEP2: 5. Users who choose to run this project are encouraged to set the 'Leave applications in memory while suspended' option in their device profile I can't find it now, but somewhere you can set the max simultaneous CEP2 units to run simultaneously. It defaults to 1 so I am guessing you have changed it before. Try dialing down Which will reduce the Ram needs. HTH MrKermit ![]() |
||
|
|
MrKermit
Advanced Cruncher Joined: Jun 13, 2009 Post Count: 95 Status: Offline Project Badges:
|
Found it... toward the bottom of device profiles....
----------------------------------------"Project Specific Settings Number of workunits per host for The Clean Energy Project - Phase 2? The Clean Energy Project - Phase 2 is limited on how many workunits a computer can have downloaded at a time and what minimum bandwidth the computer must have in order to receive any workunits. Changing this value from the default of 1 will override both of these restrictions." There are several google threads saying 4 is a good setting unless you have SSD for WCG to thrash, and this thread was even cooler... http://www.xtremesystems.org/forums/showthrea...n-100-CEP2-WUs-Here-s-how Also, Check the windows event viewer to see if it is crying about anything... just in case there is a hint there. Can't wait to hear your findings :) ![]() |
||
|
|
KLiK
Master Cruncher Croatia Joined: Nov 13, 2006 Post Count: 3108 Status: Offline Project Badges:
|
Also, here are the instructions how to program app_config.xml & limit the CEP2 to 4cores only:
----------------------------------------http://boinc.berkeley.edu/trac/wiki/ClientAppConfig I would use something like this on 8 core machine: <app_config> <app> <name>mcm1</name> <user_friendly_name>Mapping Cancer Markers</user_friendly_name> <max_concurrent>7</max_concurrent> </app> <app> <name>cep2</name> <user_friendly_name>The Clean Energy Project - Phase 2</user_friendly_name> <max_concurrent>4</max_concurrent> </app> <app> <name>fahv</name> <user_friendly_name>FightAIDS@Home - Vina</user_friendly_name> <max_concurrent>7</max_concurrent> </app> <app> <name>ugm1</name> <user_friendly_name>Uncovering Genome Mysteries</user_friendly_name> <max_concurrent>7</max_concurrent> </app> <app> <name>oet1</name> <user_friendly_name>Outsmart Ebola Together</user_friendly_name> <max_concurrent>7</max_concurrent> </app> </app_config> |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Wolf Fivousix, based on the reply you gave [intermittent], and other suggestions, propose
1) Set client not to compute when device is in use [Make sure the Activity menu option is set to 'Run based on preferences']. You can set the resume to for instance 5 minutes. 2) 'Leave application in memory when suspended' can/should be left on [too much progress loss if unloaded and drama when all resume from checkpoint again, or start]. Eventually if the client is paused, the task models are migrated to VM/Swap file on disk [long as that can grow big enough to hold the potentially 1.2GB per task being computed]. Recommended RAM allowed for CEP2 is 1GB, but with 8 concurrent you probably can do with 5-6GB RAM without inducing disk swapping while computing. Think the specs you give in the OP suggests you have 16GB at disposal... plenty to not get disk-swapping during computing! 3) For TThrottle, a third party utility written for BOINC, you have to make sure that BOINC itself is set to use 100% of the time. (Use at most 100% of CPU time). Set a ceiling temp, say 65C, and the utility will slow down BOINC to stay below that temp point. 4) When deciding to limit the number of concurrent CEP2 tasks, there's of course the app_config way and the preferences 'On multiprocessors systems, use at most nn% of processors. The question is if you want to substitute with light work on the non-CEP2 computing cores [OET/FAAH/MCM/UGM] or allow idling cores to just focus on CEP2 [and reduce operating temps on the go and be able to game while BOINCing]]. If wanting to go for light work in substitution, then app_config is the better route. 5) Reinstalling to SSD can significantly help bottle-necking storage I/O [checkpoint files are big]. Uninstall BOINC, copy the BOINC data dir and subs to the SSD, usually C:\ProgramData\BOINC\ and install BOINC again, pointing it to the new data_dir location. You will then be able to continue computing where you left off, noting that you better pull the internet connection until happy with the relocation of BOINC. The servers are very sensitive to connections and not finding the assigned task inventory on the host. In that the safeguard is to run the client dry before doing the move. KliK, think I mentioned it before <user_friendly_name> is -not- a valid app_config tag. It's ignored, which is what BOINC does for any illegal tag. The only function it could have is for the user to be reminded what cep2 etc stands for ;<0. |
||
|
|
katoda
Senior Cruncher Poland Joined: Apr 28, 2007 Post Count: 172 Status: Offline Project Badges:
|
Katoda, I don't have any kind of special equipment, is there anyway (like a software) that I could test my PSU? If you do not have any PSU tester then the only 100% sure method is to put for some time another PSU and observe how the system behaves. I'm not sure of the software like HWMonitor (which gives a nice overview about various voltages inside the PC) can help in case of problematic PSU. ![]() |
||
|
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 786 Status: Offline Project Badges:
|
One thing I would try is booting from a Linux Live CD/DVD/USB, "install" BOINC and run with that for a while.
----------------------------------------I have seen Windows installations develop issues after power loss, MS updates or just "normal" running. Paul.
Paul.
|
||
|
|
Coleslaw
Veteran Cruncher USA Joined: Mar 29, 2007 Post Count: 1343 Status: Offline Project Badges:
|
Wolf Fivousix , the reason I mention the HDD being the issue is that I have seen first hand drives that tested fine that used to work with CEP2 suddenly stop working with CEP2. Later down the road the drives started showing more signs. However, replacing the HDD fixed the issues that I saw. Too many IO's on a "spinner" drive (traditional HDD) can and will bring a system to a halt. I have done it many of times on my 32 thread system that had BOINC on an old disk. I have since replaced that disk. I have even had it happen on good disks as not too long ago LHC@home had some work units that were very IO intensive. They brought a couple of my servers down before I found out what the problem was.
----------------------------------------As far as the work unit I speak of, I meant which app. In this case the answer was CEP2. I asked for clarification because you did not say if you were running any projects outside of WCG and you did not specify which sub-projects of WCG you were having troubles with. But now we know and that is a strong clue to the issue at hand. ![]() ![]() ![]() ![]() |
||
|
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 786 Status: Offline Project Badges:
|
Have you checked the HD ?
----------------------------------------SMART stats can be checked by diagnostic s/w from WD: http://support.wdc.com/product/download.asp?groupid=605&sid=3&lang=en (check that is the correct one for your drive) Paul.
Paul.
|
||
|
|
|