| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
After having had to deal with those strange hanging SCC1 WU for ages without a solution/explanation, I noticed today a new issue on so far one of my (remote) crunching hosts.
The machine in question is a Windows 10 Pro 1803, i5 with 8GB of RAM. All MIP WUs are just erroring out, while at the same time, it is completely fine crunching MCM and FAH2 WUs. Anyone else have seen such behavior? Ralf ![]() |
||
|
|
chandanprakash2002@yahoo.com
Cruncher United Kingdom Joined: Feb 8, 2017 Post Count: 4 Status: Offline Project Badges:
|
Hi, where do you see that? on the status of each of those work units? i dont recall seeing any such errors. I do have two I5's with one of them on windows 10.
|
||
|
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 328 Status: Offline Project Badges:
|
I have two Win 10 PCs on which MIP runs perfectly until I need to shutdown or restart them. On BOINC starting again most MIP work units fail with a computation error. I can run for a few days by using the hibernate facility instead of shutdown but Windows, anti-virus and other program updates demand a restart. I have given up the MIP sub-project on Win 10. The problem must be the way I have set up Win 10 otherwise the problem would cause widespread chaos for WCG.
One of the PCs also boots into Linux Mint where MIP runs and restarts perfectly. |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
Hi, where do you see that? on the status of each of those work units? i dont recall seeing any such errors. I do have two I5's with one of them on windows 10. They show up under "Error" on the "Result Status" page. When I click on the "Error" link it showsResult Name: MIP1_ 00156656_ 0634_ 0-- As mentioned before, it gets a processes WUs from other WCG projects just fine (and in cases someone thinks that there is not enough drive space, the box has 650GB of a 1TB hard drive still available).<core_client_version>7.2.47</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>mip1_image02_7.16.tga</file_name> <error_code>-200 (wrong size)</error_code> </file_xfer_error> <file_xfer_error> <file_name>mip1_image08_7.16.tga</file_name> <error_code>-200 (wrong size)</error_code> </file_xfer_error> </message> ]]> Ralf ![]() |
||
|
|
TonyEllis
Senior Cruncher Australia Joined: Jul 9, 2008 Post Count: 286 Status: Offline Project Badges:
|
<core_client_version>7.2.47</core_client_version> Isn't that a bit old? Running 7.14.2 here on Windows 10 which I think is still the current WCG version... Time for a boinc software update?
Run Time Stats https://grassmere-productions.no-ip.biz/
|
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
<core_client_version>7.2.47</core_client_version> Isn't that a bit old? Running 7.14.2 here on Windows 10 which I think is still the current WCG version... Time for a boinc software update? And I just check on another host that shows (more occasionally) errors, also running Windows 10 (i7, 8GB RAM, plenty of drive space) Result Log Result Name: MIP1_ 00154580_ 0894_ 0-- <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>mip1.MIP1_00154580.1</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> <error_message>MD5 check failed</error_message> </file_xfer_error> </message> ]]> Ralf [Edit 1 times, last edit by TPCBF at Feb 2, 2019 2:56:06 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Currently running 100% MIP1 on 15 hosts (14 Linux and 1 Windows 7) and none have received an error in the past 30 days. The first two files you reported were image files for the screen saver. I deleted those 2 files from one of my hosts and WCG resent them to that host just fine. It looks like there might be a problem in the transmission like dropped blocks or bits getting changed some how. MIP1 has some large download files in it's WUs that can be as much as 17MB
|
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
15 hosts (14 Linux and 1 Windows 7 This is off topic, but you must have a whale of an electric bill. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
TPCBF
Master Cruncher USA Joined: Jan 2, 2011 Post Count: 2173 Status: Offline Project Badges:
|
The first two files you reported were image files for the screen saver. I deleted those 2 files from one of my hosts and WCG resent them to that host just fine. Why would the screen saver files cause a WU (MIP1_ 00156656_ 0634_ 0--) to be dropped with an error. And that was just a random file on that one host. All other WUs, MCM, FAH2 are working just fine. I can access that machine remotely just fine, the client is working the machine with all apps (like a couple hundred browser tabs at times) just fine.And it wasn't until the other day that I noticed that a lot more hosts seem to have been developing more of errors recently, like the one that I posted later in response to the claim it would be the BOINC version that would be the likely cause... Ralf |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
The first two files you reported were image files for the screen saver. I deleted those 2 files from one of my hosts and WCG resent them to that host just fine. Why would the screen saver files cause a WU (MIP1_ 00156656_ 0634_ 0--) to be dropped with an error. And that was just a random file on that one host. All other WUs, MCM, FAH2 are working just fine. I can access that machine remotely just fine, the client is working the machine with all apps (like a couple hundred browser tabs at times) just fine.And it wasn't until the other day that I noticed that a lot more hosts seem to have been developing more of errors recently, like the one that I posted later in response to the claim it would be the BOINC version that would be the likely cause... Ralf Just a thought here, but when was the last time any of these hosts were rebooted ? If they are running Windows, over time there have been reports of some memory not being released when a work unit finishes. It might not be much, but over time it adds up. It is a good practice to periodically reboot once in a while in order to reset all your parameters back to a clean start. This may also affect Linux hosts, but I have had systems running for many months at a time without problems. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
|