Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 15
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3865 times and has 14 replies Next Thread
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

15 hosts (14 Linux and 1 Windows 7

This is off topic, but you must have a whale of an electric bill. cool
Cheers

I have been thinking for a while that he is retired from the oil industry and can afford it.
[Feb 3, 2019 3:59:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 786
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

Ralf,

Those messages indicate files corrupted, possibly by errors on transfer or action by anti-virus software or similar.
I thought files should be re-sent in such circumstances but it may depend on when error is detected.
Suggest try re-boot then reset project if issue persists (set "no new work" and drain workunits first).
May also be over-heating or failing memory.

Paul.
----------------------------------------
Paul.
[Feb 3, 2019 5:45:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

Ralf,

Those messages indicate files corrupted, possibly by errors on transfer or action by anti-virus software or similar.
I thought files should be re-sent in such circumstances but it may depend on when error is detected.
Suggest try re-boot then reset project if issue persists (set "no new work" and drain workunits first).
May also be over-heating or failing memory.

Paul.
Would have to check when this host has last time been rebooted, but what would contradict the "memory not release" theory, why are WUs from all other projects working just fine? And why are similar errors now start to show up on a whole bunch of other hosts as well, all which are otherwise crunching along just fine? I will see if I can remotely reboot this system tonight but somehow I am not confident that this will be a solution.
Beside. if WUs would not be properly memory, it would be something that needs to be fixed in programming, the projects should not assume that someone has the time to constantly babysit and reboot system left and right just on a lark...

Ralf
[Feb 3, 2019 8:47:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7844
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

And why are similar errors now start to show up on a whole bunch of other hosts as well, all which are otherwise crunching along just fine?
Well, MIP uses memory heavily. I know that if you are running MIP exclusively, there will be an inordinate number of cache misses. ( This information comes to me from someone much more knowledgeable on computer architecture than I am.) If other projects ahve a smaller or less intensive use of memory, doing the reboot will reset any locked up memory locations.There may be some other reason that particular host is throwing errors only for the MIP project, but it is beyond my range of knowledge. (Heat, dust bunnies, transient voltage fluctuations, etc.)
Good luck
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 4, 2019 1:12:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

Well, MIP uses memory heavily. I know that if you are running MIP exclusively, there will be an inordinate number of cache misses.

I use an app_config.xml file (placed in the "www.worldcommunitygrid.org" ProgramData folder) to limit the number of MIPs running at a time to two. I do this to maintain efficiency, since otherwise the MIPs tend to interfere with each other and lengthen the run time. But it may also reduce errors; I currently have no errors showing on two machines (Ryzen 1700 and 2700) running under Ubuntu 18.04.1, with 15 cores each running WCG projects.

As I and others have posted elsewhere, here is what works:
<app_config>
<app>
<name>mip1</name>
<max_concurrent>2</max_concurrent>
</app>
</app_config>

You then need to activate it, which I would do by a reboot in this case.
[Feb 4, 2019 3:07:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 15   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread