Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 14
|
![]() |
Author |
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am still considering that MIP1 is really not OK. There is no acceptable justification for negative interactions between projects (excepted regarding performance). It does seem to work the other way too. That is, I started running Ebola again after several days without it, along with MIP. I got a few errors on Ebola (about one out of ten), but also for the first time I picked up three errors on MIP. I don't recall ever seeing that before when they run by themselves. One thing I do a little differently than most people is that I run Folding on a GPU (GTX 1070), and reserve a CPU core for it. Normally, there is no interaction with the BOINC projects, but I have seen it on rare occasion. However, I consider it high priority, and will not be stopping it to check. So my experience may not be the same as for others, but yes MIP is a little more problematic than some. The MIP errors were all the same: <core_client_version>7.12.0</core_client_version> [Edit 1 times, last edit by Jim1348 at Oct 20, 2018 1:44:46 PM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7633 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
process got signal 11</message> This is the key message. When I have gotten this message in the distant past, it was an indication there was some part of the machine which was bottle necked in some way. Some processes were competing for the same resource at the same time and the software was not able to handle the conflict in a smooth way. I don't know where in the system the conflict may have occurred, but it only happened with CEP2 ( which was very resource intensive) on one system. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I don't know where in the system the conflict may have occurred, but it only happened with CEP2 ( which was very resource intensive) on one system. Interesting. I started using a ramdisk or write-cache back in the CEP2 days (to protect the SSD), and it generally avoided problems. I still use a 12 GB write cache on the Ryzen 1700 (Ubuntu 18.04). But with 32 GB main memory, I still have 22 GB free at the moment (not all the cache is used). That is with four MIP running, and all the other cores busy on WCG or Folding/GPU. So it is probably some resource other than memory in conflict, though I have no idea what. I actually limit MIP to four at a time with an app_config to prevent problems with run times; they are currently averaging 1 hour 30 minutes, and the maximum is under 3 hours. Maybe I will try limiting them to three at a time, though I think I will just drop Ebola first. It is near the end of its run, and they don't really need me now. Thanks. |
||
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I picked up another Signal 11 error on MIP this morning, even though I was not running any OET. So I will limit MIP to running three at a time with the app_config, and also set "Number of workunits per host for the Microbiome Immunity Project?" to 12, in order to prevent too many from downloading. That should fix it.
I have never gotten an MIP error on my i7-4771 (Win 7 64-bit), though they are usually limited to running two at a time there. But it could be the Intel machines are more resistant to MIP errors. I will be building a Ryzen 2700 shortly, and will see how it goes there. |
||
|
|
![]() |