| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges:
|
15 hosts (14 Linux and 1 Windows 7 This is off topic, but you must have a whale of an electric bill. Cheers I have been thinking for a while that he is retired from the oil industry and can afford it. |
||
|
|
PMH_UK
Veteran Cruncher UK Joined: Apr 26, 2007 Post Count: 786 Status: Offline Project Badges:
|
Ralf,
----------------------------------------Those messages indicate files corrupted, possibly by errors on transfer or action by anti-virus software or similar. I thought files should be re-sent in such circumstances but it may depend on when error is detected. Suggest try re-boot then reset project if issue persists (set "no new work" and drain workunits first). May also be over-heating or failing memory. Paul.
Paul.
|
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Recently Active Project Badges:
|
Ralf, Would have to check when this host has last time been rebooted, but what would contradict the "memory not release" theory, why are WUs from all other projects working just fine? And why are similar errors now start to show up on a whole bunch of other hosts as well, all which are otherwise crunching along just fine? I will see if I can remotely reboot this system tonight but somehow I am not confident that this will be a solution. Those messages indicate files corrupted, possibly by errors on transfer or action by anti-virus software or similar. I thought files should be re-sent in such circumstances but it may depend on when error is detected. Suggest try re-boot then reset project if issue persists (set "no new work" and drain workunits first). May also be over-heating or failing memory. Paul. Beside. if WUs would not be properly memory, it would be something that needs to be fixed in programming, the projects should not assume that someone has the time to constantly babysit and reboot system left and right just on a lark... Ralf |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7844 Status: Recently Active Project Badges:
|
And why are similar errors now start to show up on a whole bunch of other hosts as well, all which are otherwise crunching along just fine?
----------------------------------------Well, MIP uses memory heavily. I know that if you are running MIP exclusively, there will be an inordinate number of cache misses. ( This information comes to me from someone much more knowledgeable on computer architecture than I am.) If other projects ahve a smaller or less intensive use of memory, doing the reboot will reset any locked up memory locations.There may be some other reason that particular host is throwing errors only for the MIP project, but it is beyond my range of knowledge. (Heat, dust bunnies, transient voltage fluctuations, etc.) Good luck Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges:
|
Well, MIP uses memory heavily. I know that if you are running MIP exclusively, there will be an inordinate number of cache misses. I use an app_config.xml file (placed in the "www.worldcommunitygrid.org" ProgramData folder) to limit the number of MIPs running at a time to two. I do this to maintain efficiency, since otherwise the MIPs tend to interfere with each other and lengthen the run time. But it may also reduce errors; I currently have no errors showing on two machines (Ryzen 1700 and 2700) running under Ubuntu 18.04.1, with 15 cores each running WCG projects. As I and others have posted elsewhere, here is what works: <app_config> <app> <name>mip1</name> <max_concurrent>2</max_concurrent> </app> </app_config> You then need to activate it, which I would do by a reboot in this case. |
||
|
|
|