Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: FightAIDS@Home Phase 2 Thread: FAH2_002735_zinc16286661 units all error out |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 12
|
Author |
|
smeyer55
Senior Cruncher Joined: Feb 15, 2009 Post Count: 303 Status: Offline Project Badges: |
All my FAH2_002735_zinc16286661 units on several different machines are reporting Error status. The wing man is also getting an error.
<core_client_version>7.14.2</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>755588a851cd77a541efa04a4947bb18.dms</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> <error_message>MD5 check failed</error_message> </file_xfer_error> </message> ]]> Steve |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12128 Status: Offline Project Badges: |
It looks like faulty units. Also have you tried updating to 7.14.3
Mike |
||
|
yoerik
Senior Cruncher Canada Joined: Mar 24, 2020 Post Count: 413 Status: Offline Project Badges: |
All my FAH2_002735_zinc16286661 units on several different machines are reporting Error status. The wing man is also getting an error. <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>755588a851cd77a541efa04a4947bb18.dms</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> <error_message>MD5 check failed</error_message> </file_xfer_error> </message> ]]> Steve Yeah, your client is out of date. That could be a major factor in what's causing the errors. |
||
|
alanb1951
Veteran Cruncher Joined: Jan 20, 2006 Post Count: 868 Status: Offline Project Badges: |
I record the reasons for getting retries of FAH2 work, so I've been seeing this "second hand" for quite a while now (unfortunately I've not been recording the exact O/S version and client version...). My Linux systems have been happily running retries on these batch 2735 jobs; almost, if not all, the failing tasks are Windows 10 systems (I think there have been some Windows 7 ones, but not many). The majority of errors are checksums, but some of them are "wrong size" errors
----------------------------------------I don't think I've ever had an FAH2 job fail on any of my systems (not counting [very rare] system crashes or Server Aborts when prior users have taken just over the deadline to send in results!!!) So, either my Linux systems are picking up the files from somewhere different and getting fair copies or there is something about those jobs that upsets various clients on Windows 10... I can't think of any reason why this particular batch should be giving so much grief, but if other batches were being equally problematic we'd be seeing a lot of users in here complaining! I wonder if the WCG tech people are aware of this issue? @Mike Gibson - I have done retries for tasks that failed on various clients - the latest one I saw failed on 7.16.5 (! - not specific to one client, then...) but most of the failures I'm seeing lately are 7.14.3 so that's not likely to help! @smeyer55 - A successful FAH2 job doesn't have a wingman; what you're seeing is a failure followed by a failed retry - eventually it'll either get sent to Linux or a Windows system that doesn't fail... I am frequently the third user to have tried one of these, as the first retry went to another Windows 10 system... A recent case in point FAH2_002735_zinc1628661_000004_000008_127 where the first task was on Windows 10 Core x64 Edition (18362) with client 7.14.3 and a "wrong size" error, and the first retry was on Windows 10 Enterprise x64 Edition (18363), again with client 7.14.3 and an "MD5 check failed" error. Both of them failed on the same file, which had one of those long hexadecimal file names and a .dms extension. Mine ran o.k.! (Ubuntu 18.04, client 7.14.2.) Whatever the issue might be, it's getting to be almost as regular as the Windows 8 "Required Privilege not held" error (as seen from my retry-processing viewpoint)... Hope something comes along to resolve this, but I'm at a loss to explain it (and have no Windows systems to look at for ideas...) Cheers - Al. [Edit 1 times, last edit by alanb1951 at May 4, 2020 1:53:13 AM] |
||
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 365 Status: Offline Project Badges: |
This is not a client version issue. It happens on SCC and MIP also. I've been seeing checksum errors sporadically for several months on all my machines. The techs don't seem to be very concerned, probably because the retries process correctly. I do see every once and a while a unit where all retries fail and the unit stops being sent out. I would think the techs know when that happens.
----------------------------------------[Edit 4 times, last edit by AgrFan at May 4, 2020 2:15:30 AM] |
||
|
yoerik
Senior Cruncher Canada Joined: Mar 24, 2020 Post Count: 413 Status: Offline Project Badges: |
This is not a client version issue. It happens on SCC and MIP also. I've been seeing checksum errors sporadically for several months on all my machines. The techs don't seem to be very concerned, probably because the retries process correctly. I do see every once and a while a unit where all retries fail and the unit stops being sent out. I would think the techs know when that happens. that it's happening on other WCG subprojects would make me more inclined to think it's an issue with your machine. Unless you have another theory - given the number of workunits - what are the odds of you getting bad WUs that frequently - which run fine on other PCs? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
All my FAH2_002735_zinc16286661 units on several different machines are reporting Error status. The wing man is also getting an error. <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>755588a851cd77a541efa04a4947bb18.dms</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> <error_message>MD5 check failed</error_message> </file_xfer_error> </message> ]]> Steve Yeah, your client is out of date. That could be a major factor in what's causing the errors. About 0.00000001% or less. I'm running 7.14.2 with flawless FAHB results. |
||
|
smeyer55
Senior Cruncher Joined: Feb 15, 2009 Post Count: 303 Status: Offline Project Badges: |
I'm running other FAH work units fine on all the machines. It is just that particular series that is failing.
Steve |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not had error on FAHB for a very long time, the particular 2735 I had a few in the log and they completed fine, but was April 22 and May 1
World Community Grid 7.30 FightAIDS@Home - Phase 2 FAH2_002735_zinc16286661_000003_000037_133_1 05:07:31 (04:46:32) 5/1/2020 5:38:13 PM 5/1/2020 5:42:36 PM 93.18 Reported: OK World Community Grid 7.30 FightAIDS@Home - Phase 2 FAH2_002735_zinc16286661_000002_000030_101_1 05:24:50 (04:51:08) 4/22/2020 6:12:56 PM 4/22/2020 6:53:50 PM 89.63 Reported: OK The only issue I see is the occasional download fail on FAHB and MIP1 but have no log of those in BOINCStats. Doesn't bother me. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7574 Status: Offline Project Badges: |
This is just a shot in the dark, but a couple of years ago I had this same problem occurring sporadically on some machines at the end of my wireless. It was 99.9% cured when I upgraded from wireless G to Wireless N. If you have a wired connection, it might be somewhere in the connections or maybe a defective cable, especially if it is only one one machine.
----------------------------------------Just a thought. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at May 4, 2020 1:42:08 PM] |
||
|
|