Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 12
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5431 times and has 11 replies Next Thread
smeyer55
Senior Cruncher
Joined: Feb 15, 2009
Post Count: 303
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
FAH2_002735_zinc16286661 units all error out

All my FAH2_002735_zinc16286661 units on several different machines are reporting Error status. The wing man is also getting an error.
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>755588a851cd77a541efa04a4947bb18.dms</file_name>
<error_code>-119 (md5 checksum failed for file)</error_code>
<error_message>MD5 check failed</error_message>
</file_xfer_error>
</message>
]]>

Steve
[May 3, 2020 8:31:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12128
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

It looks like faulty units. Also have you tried updating to 7.14.3

Mike
[May 4, 2020 1:04:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
yoerik
Senior Cruncher
Canada
Joined: Mar 24, 2020
Post Count: 413
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

All my FAH2_002735_zinc16286661 units on several different machines are reporting Error status. The wing man is also getting an error.
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>755588a851cd77a541efa04a4947bb18.dms</file_name>
<error_code>-119 (md5 checksum failed for file)</error_code>
<error_message>MD5 check failed</error_message>
</file_xfer_error>
</message>
]]>

Steve


Yeah, your client is out of date. That could be a major factor in what's causing the errors.
----------------------------------------

[May 4, 2020 1:40:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 868
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

I record the reasons for getting retries of FAH2 work, so I've been seeing this "second hand" for quite a while now (unfortunately I've not been recording the exact O/S version and client version...). My Linux systems have been happily running retries on these batch 2735 jobs; almost, if not all, the failing tasks are Windows 10 systems (I think there have been some Windows 7 ones, but not many). The majority of errors are checksums, but some of them are "wrong size" errors

I don't think I've ever had an FAH2 job fail on any of my systems (not counting [very rare] system crashes or Server Aborts when prior users have taken just over the deadline to send in results!!!) So, either my Linux systems are picking up the files from somewhere different and getting fair copies or there is something about those jobs that upsets various clients on Windows 10...

I can't think of any reason why this particular batch should be giving so much grief, but if other batches were being equally problematic we'd be seeing a lot of users in here complaining! I wonder if the WCG tech people are aware of this issue?

@Mike Gibson - I have done retries for tasks that failed on various clients - the latest one I saw failed on 7.16.5 (! - not specific to one client, then...) but most of the failures I'm seeing lately are 7.14.3 so that's not likely to help!

@smeyer55 - A successful FAH2 job doesn't have a wingman; what you're seeing is a failure followed by a failed retry - eventually it'll either get sent to Linux or a Windows system that doesn't fail... I am frequently the third user to have tried one of these, as the first retry went to another Windows 10 system...

A recent case in point FAH2_002735_zinc1628661_000004_000008_127 where the first task was on Windows 10 Core x64 Edition (18362) with client 7.14.3 and a "wrong size" error, and the first retry was on Windows 10 Enterprise x64 Edition (18363), again with client 7.14.3 and an "MD5 check failed" error. Both of them failed on the same file, which had one of those long hexadecimal file names and a .dms extension. Mine ran o.k.! (Ubuntu 18.04, client 7.14.2.)

Whatever the issue might be, it's getting to be almost as regular as the Windows 8 "Required Privilege not held" error (as seen from my retry-processing viewpoint)...

Hope something comes along to resolve this, but I'm at a loss to explain it (and have no Windows systems to look at for ideas...)

Cheers - Al.
----------------------------------------
[Edit 1 times, last edit by alanb1951 at May 4, 2020 1:53:13 AM]
[May 4, 2020 1:51:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
AgrFan
Senior Cruncher
USA
Joined: Apr 17, 2008
Post Count: 365
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

This is not a client version issue. It happens on SCC and MIP also. I've been seeing checksum errors sporadically for several months on all my machines. The techs don't seem to be very concerned, probably because the retries process correctly. I do see every once and a while a unit where all retries fail and the unit stops being sent out. I would think the techs know when that happens.
----------------------------------------
[Edit 4 times, last edit by AgrFan at May 4, 2020 2:15:30 AM]
[May 4, 2020 2:09:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
yoerik
Senior Cruncher
Canada
Joined: Mar 24, 2020
Post Count: 413
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

This is not a client version issue. It happens on SCC and MIP also. I've been seeing checksum errors sporadically for several months on all my machines. The techs don't seem to be very concerned, probably because the retries process correctly. I do see every once and a while a unit where all retries fail and the unit stops being sent out. I would think the techs know when that happens.

that it's happening on other WCG subprojects would make me more inclined to think it's an issue with your machine.

Unless you have another theory - given the number of workunits - what are the odds of you getting bad WUs that frequently - which run fine on other PCs?
----------------------------------------

[May 4, 2020 5:22:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

All my FAH2_002735_zinc16286661 units on several different machines are reporting Error status. The wing man is also getting an error.
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>755588a851cd77a541efa04a4947bb18.dms</file_name>
<error_code>-119 (md5 checksum failed for file)</error_code>
<error_message>MD5 check failed</error_message>
</file_xfer_error>
</message>
]]>

Steve


Yeah, your client is out of date. That could be a major factor in what's causing the errors.

About 0.00000001% or less. I'm running 7.14.2 with flawless FAHB results.
[May 4, 2020 10:45:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
smeyer55
Senior Cruncher
Joined: Feb 15, 2009
Post Count: 303
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

I'm running other FAH work units fine on all the machines. It is just that particular series that is failing.

Steve
[May 4, 2020 12:56:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

Not had error on FAHB for a very long time, the particular 2735 I had a few in the log and they completed fine, but was April 22 and May 1

World Community Grid 7.30 FightAIDS@Home - Phase 2 FAH2_002735_zinc16286661_000003_000037_133_1 05:07:31 (04:46:32) 5/1/2020 5:38:13 PM 5/1/2020 5:42:36 PM 93.18 Reported: OK
World Community Grid 7.30 FightAIDS@Home - Phase 2 FAH2_002735_zinc16286661_000002_000030_101_1 05:24:50 (04:51:08) 4/22/2020 6:12:56 PM 4/22/2020 6:53:50 PM 89.63 Reported: OK

The only issue I see is the occasional download fail on FAHB and MIP1 but have no log of those in BOINCStats. Doesn't bother me.
[May 4, 2020 1:40:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7574
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: FAH2_002735_zinc16286661 units all error out

This is just a shot in the dark, but a couple of years ago I had this same problem occurring sporadically on some machines at the end of my wireless. It was 99.9% cured when I upgraded from wireless G to Wireless N. If you have a wired connection, it might be somewhere in the connections or maybe a defective cable, especially if it is only one one machine.
Just a thought.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at May 4, 2020 1:42:08 PM]
[May 4, 2020 1:41:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread