Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 15
Posts: 15   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3866 times and has 14 replies Next Thread
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
confused MIP WUs erroring out

After having had to deal with those strange hanging SCC1 WU for ages without a solution/explanation, I noticed today a new issue on so far one of my (remote) crunching hosts.
The machine in question is a Windows 10 Pro 1803, i5 with 8GB of RAM. All MIP WUs are just erroring out, while at the same time, it is completely fine crunching MCM and FAH2 WUs.
Anyone else have seen such behavior?

Ralf confused
[Jan 27, 2019 6:17:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
chandanprakash2002@yahoo.com
Cruncher
United Kingdom
Joined: Feb 8, 2017
Post Count: 4
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

Hi, where do you see that? on the status of each of those work units? i dont recall seeing any such errors. I do have two I5's with one of them on windows 10.
[Jan 27, 2019 10:33:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 328
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

I have two Win 10 PCs on which MIP runs perfectly until I need to shutdown or restart them. On BOINC starting again most MIP work units fail with a computation error. I can run for a few days by using the hibernate facility instead of shutdown but Windows, anti-virus and other program updates demand a restart. I have given up the MIP sub-project on Win 10. The problem must be the way I have set up Win 10 otherwise the problem would cause widespread chaos for WCG.
One of the PCs also boots into Linux Mint where MIP runs and restarts perfectly.
[Jan 27, 2019 12:30:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

Hi, where do you see that? on the status of each of those work units? i dont recall seeing any such errors. I do have two I5's with one of them on windows 10.
They show up under "Error" on the "Result Status" page. When I click on the "Error" link it shows
Result Name: MIP1_ 00156656_ 0634_ 0--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>mip1_image02_7.16.tga</file_name>
<error_code>-200 (wrong size)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>mip1_image08_7.16.tga</file_name>
<error_code>-200 (wrong size)</error_code>
</file_xfer_error>

</message>
]]>
As mentioned before, it gets a processes WUs from other WCG projects just fine (and in cases someone thinks that there is not enough drive space, the box has 650GB of a 1TB hard drive still available).

Ralf confused
[Feb 2, 2019 7:41:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 286
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out


<core_client_version>7.2.47</core_client_version>

Isn't that a bit old?
Running 7.14.2 here on Windows 10 which I think is still the current WCG version...

Time for a boinc software update?
----------------------------------------
[Feb 2, 2019 8:15:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out


<core_client_version>7.2.47</core_client_version>

Isn't that a bit old?
Running 7.14.2 here on Windows 10 which I think is still the current WCG version...

Time for a boinc software update?
It's a remote host that I can't easily update, that same version runs of other hosts at that site and again, all other projects run fine.

And I just check on another host that shows (more occasionally) errors, also running Windows 10 (i7, 8GB RAM, plenty of drive space)
Result Log

Result Name: MIP1_ 00154580_ 0894_ 0--
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>mip1.MIP1_00154580.1</file_name>
<error_code>-119 (md5 checksum failed for file)</error_code>
<error_message>MD5 check failed</error_message>
</file_xfer_error>
</message>
]]>

Ralf
----------------------------------------
[Edit 1 times, last edit by TPCBF at Feb 2, 2019 2:56:06 PM]
[Feb 2, 2019 2:52:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

Currently running 100% MIP1 on 15 hosts (14 Linux and 1 Windows 7) and none have received an error in the past 30 days. The first two files you reported were image files for the screen saver. I deleted those 2 files from one of my hosts and WCG resent them to that host just fine. It looks like there might be a problem in the transmission like dropped blocks or bits getting changed some how. MIP1 has some large download files in it's WUs that can be as much as 17MB
[Feb 2, 2019 4:48:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

15 hosts (14 Linux and 1 Windows 7

This is off topic, but you must have a whale of an electric bill. cool
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 2, 2019 7:20:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 2173
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

The first two files you reported were image files for the screen saver. I deleted those 2 files from one of my hosts and WCG resent them to that host just fine.
Why would the screen saver files cause a WU (MIP1_ 00156656_ 0634_ 0--) to be dropped with an error. And that was just a random file on that one host. All other WUs, MCM, FAH2 are working just fine. I can access that machine remotely just fine, the client is working the machine with all apps (like a couple hundred browser tabs at times) just fine.
And it wasn't until the other day that I noticed that a lot more hosts seem to have been developing more of errors recently, like the one that I posted later in response to the claim it would be the BOINC version that would be the likely cause...

Ralf
[Feb 3, 2019 6:45:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7846
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: MIP WUs erroring out

The first two files you reported were image files for the screen saver. I deleted those 2 files from one of my hosts and WCG resent them to that host just fine.
Why would the screen saver files cause a WU (MIP1_ 00156656_ 0634_ 0--) to be dropped with an error. And that was just a random file on that one host. All other WUs, MCM, FAH2 are working just fine. I can access that machine remotely just fine, the client is working the machine with all apps (like a couple hundred browser tabs at times) just fine.
And it wasn't until the other day that I noticed that a lot more hosts seem to have been developing more of errors recently, like the one that I posted later in response to the claim it would be the BOINC version that would be the likely cause...

Ralf

Just a thought here, but when was the last time any of these hosts were rebooted ? If they are running Windows, over time there have been reports of some memory not being released when a work unit finishes. It might not be much, but over time it adds up. It is a good practice to periodically reboot once in a while in order to reset all your parameters back to a clean start. This may also affect Linux hosts, but I have had systems running for many months at a time without problems.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Feb 3, 2019 12:33:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 15   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread