Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 47
Posts: 47   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 7348 times and has 46 replies Next Thread
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Hi SekeRob.

You asked for it.

Project Name: FightAIDS@Home - Phase 2
Created: 04/23/2016 20:15:44
Name: FAH2_000077_avx38672_000072_0022_021
Minimum Quorum: 1
Replication: 2


Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit

FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 2-- Linux 3.10.0-327.4.5.el7.x86_64 - In Progress 4/25/16 05:03:35 4/29/16 05:03:35 8.32 116.6 / 0.0

FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 1-- Linux 3.19.0-20-generic - In Progress 4/25/16 05:03:31 4/29/16 05:03:31 3.63 77.7 / 0.0

FAH2_ 000077_ avx38672_ 000072_ 0022_ 021_ 0-- Linux 3.16.0-70-generic 715 Invalid 4/24/16 02:50:20 4/25/16 05:02:00 13.06 490.2 / 0.0 === ME.

@TECHS, why are two copies sent out after an invalid? What happens when both copies start trickling? What happens if both finish proper? Very contra to what you indicated FAH2 would be.
[Apr 26, 2016 6:46:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
sad Re: All jobs are failing with Invalid

Seeing the same failure issues for my WUs after my computer is offline (which is regularly is).

No more FA@H for me :(
[Apr 28, 2016 3:25:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Sekerob,

Yes it is something I've been looking into. Two copies at the same time should not be sent out. I thought I had a fix for it within the transitioner, but that does not appear to be working as expected. I need to enable more logging to see if I can catch where it is getting bumped.

Thanks,
-Uplinger
[Apr 28, 2016 2:54:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Not just this project but in general, if a machine has been consistently been returning invalid results, it would be nice to cease sending WU's to them for those particular projects. Sometimes an issue on a single machine just causes more work to be done overall. Take OET; if they return a result but yet they are not trusted to return valid results, when the wingman returns theirs, the check will show that they are not identical and the WU gets sent to a third computer. So you have gone from it only needing to be sent to 1 computer to three computers needing to work on the same WU.
[May 2, 2016 2:46:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
supdood
Senior Cruncher
USA
Joined: Aug 6, 2015
Post Count: 333
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

Sekerob,

Yes it is something I've been looking into. Two copies at the same time should not be sent out. I thought I had a fix for it within the transitioner, but that does not appear to be working as expected. I need to enable more logging to see if I can catch where it is getting bumped.

Thanks,
-Uplinger

Just had one go invalid (unknown reason) and then two additional WUs were created and sent:


Result Log

Result Name: FAH2_ 000081_ avx38741_ 000032_ 0054_ 021_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
[08:25:36] INFO:Turning trickle messaging on.
[08:25:36] INFO:Turning intermediate uploads on.
%IMPACT-I: Requested file to open for appending md.out Does not exist.
Opening it as a new file.
%IMPACT-I: Softcore binding energy with umax = 1000.00000
%IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic
Non-Polar Hydration Model
%IMPACT-I: Hybrid potential for binding with lambda = 0.55000
agbnpf_assign_parameters(): info: attempting to load from SQL tables.
[08:39:33] INFO: Checkpointed. Progress 1000 of 100000 steps complete CPU time 733.859904
[08:54:02] INFO: Checkpointed. Progress 2000 of 100000 steps complete CPU time 1511.244087
[09:09:24] INFO: Checkpointed. Progress 3000 of 100000 steps complete CPU time 2297.067925
[09:23:59] INFO: Checkpointed. Progress 4000 of 100000 steps complete CPU time 3096.136247
[09:38:59] INFO: Checkpointed. Progress 5000 of 100000 steps complete CPU time 3886.671314
[09:53:01] INFO: Checkpointed. Progress 6000 of 100000 steps complete CPU time 4671.824347
[10:07:43] INFO: Checkpointed. Progress 7000 of 100000 steps complete CPU time 5471.095471
[10:28:14] INFO: Checkpointed. Progress 8000 of 100000 steps complete CPU time 6264.766158
[10:41:56] INFO: Checkpointed. Progress 9000 of 100000 steps complete CPU time 7061.338465
[10:56:53] INFO: Sending trickle message to server.
[10:56:53] INFO: Starting intermediate upload, index = 1
[10:56:53] INFO: Checkpointed. Progress 10000 of 100000 steps complete CPU time 7861.826396
.......
[13:09:31] INFO: Checkpointed. Progress 80000 of 100000 steps complete CPU time 65666.575422
[13:24:29] INFO: Checkpointed. Progress 81000 of 100000 steps complete CPU time 66456.392885
[13:38:27] INFO: Checkpointed. Progress 82000 of 100000 steps complete CPU time 67241.062315
[13:52:58] INFO: Checkpointed. Progress 83000 of 100000 steps complete CPU time 68030.973378
[14:07:03] INFO: Checkpointed. Progress 84000 of 100000 steps complete CPU time 68813.770796
[14:21:08] INFO: Checkpointed. Progress 85000 of 100000 steps complete CPU time 69601.295044
[14:35:09] INFO: Checkpointed. Progress 86000 of 100000 steps complete CPU time 70375.231605
[15:19:28] INFO: Checkpointed. Progress 87000 of 100000 steps complete CPU time 71158.309825
[15:33:41] INFO: Checkpointed. Progress 88000 of 100000 steps complete CPU time 71950.342502
[15:47:10] INFO: Checkpointed. Progress 89000 of 100000 steps complete CPU time 72736.010338
[16:03:28] INFO:Turning trickle messaging on.
[16:03:28] INFO:Turning intermediate uploads on.
%IMPACT-I: Softcore binding energy with umax = 1000.00000
%IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic
Non-Polar Hydration Model
%IMPACT-I: Hybrid potential for binding with lambda = 0.55000
agbnpf_assign_parameters(): info: attempting to load from SQL tables.
[16:09:13] INFO: Sending trickle message to server.
[16:09:13] INFO: Starting intermediate upload, index = 9
[16:09:13] INFO: Checkpoint skipped. Progress 90000/100000 CPU time 73079.368196
[08:15:54] INFO:Turning trickle messaging on.
[08:15:54] INFO:Turning intermediate uploads on.
%IMPACT-I: Softcore binding energy with umax = 1000.00000
%IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic
Non-Polar Hydration Model
%IMPACT-I: Hybrid potential for binding with lambda = 0.55000
agbnpf_assign_parameters(): info: attempting to load from SQL tables.
[08:32:12] INFO: Sending trickle message to server.
[08:32:12] INFO: Starting intermediate upload, index = 9
[08:32:12] INFO: Checkpointed. Progress 90000 of 100000 steps complete CPU time 73544.297976
[08:46:00] INFO: Checkpointed. Progress 91000 of 100000 steps complete CPU time 74324.302976
[08:59:41] INFO: Checkpointed. Progress 92000 of 100000 steps complete CPU time 75115.633649
[09:15:11] INFO: Checkpointed. Progress 93000 of 100000 steps complete CPU time 75909.865940
[09:29:08] INFO: Checkpointed. Progress 94000 of 100000 steps complete CPU time 76693.692965
[09:43:07] INFO: Checkpointed. Progress 95000 of 100000 steps complete CPU time 77478.596396
[10:01:13] INFO: Checkpointed. Progress 96000 of 100000 steps complete CPU time 78267.571453
[10:14:48] INFO: Checkpointed. Progress 97000 of 100000 steps complete CPU time 79053.098889
[10:29:13] INFO: Checkpointed. Progress 98000 of 100000 steps complete CPU time 79841.527943
[10:43:02] INFO: Checkpointed. Progress 99000 of 100000 steps complete CPU time 80622.780951
[10:57:30] INFO: Checkpointed. Progress 100000 of 100000 steps complete CPU time 81416.061636
%IMPACT-I: Species 1 written to SQL file md-out1.dms
%IMPACT-I: Species 2 written to SQL file md-out2.dms
10:57:32 (2580): called boinc_finish(0)

</stderr_txt>
]]>


FAH2_ 000081_ avx38741_ 000032_ 0054_ 021_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.10586.00) - In Progress 5/5/16 14:57:59 5/9/16 14:57:59 0.00 0.0 / 0.0

FAH2_ 000081_ avx38741_ 000032_ 0054_ 021_ 1-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) - In Progress 5/5/16 14:57:58 5/9/16 14:57:58 0.00 0.0 / 0.0

FAH2_ 000081_ avx38741_ 000032_ 0054_ 021_ 0-- Microsoft Windows 7 Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) 714 Invalid 5/2/16 12:25:28 5/5/16 14:57:45 22.62 451.0 / 0.0
----------------------------------------
Crunch with BOINC team USA
www.boincusa.com

[May 5, 2016 3:03:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian Cantwell
Cruncher
Joined: Jul 19, 2013
Post Count: 15
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

The following is listed as invalid: FAH2_ 000091_ avx38782_ 000051_ 0005_ 020_ 0--
I've not had this problem before. The result log has

<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
[08:53:44] INFO:Turning trickle messaging on.
[08:53:44] INFO:Turning intermediate uploads on.
%IMPACT-I: Requested file to open for appending md.out Does not exist.
Opening it as a new file.
%IMPACT-I: Softcore binding energy with umax = 1000.00000
%IMPACT-I: Using AGBNP2: Analytical Generalized Born Model + Analytic
Non-Polar Hydration Model
%IMPACT-I: Hybrid potential for binding with lambda = 0.00600
agbnpf_assign_parameters(): info: attempting to load from SQL tables.

Than it checkpoints successfully to the end and finishes with
%IMPACT-I: Species 1 written to SQL file md-out1.dms
%IMPACT-I: Species 2 written to SQL file md-out2.dms
08:35:08 (7752): called boinc_finish(0)
[Jun 6, 2016 1:55:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

The operation with FAH2 is straight... do not interrupt communication while computing. If the last trickle is send, and any previous trickle was not already reported, the validation fails. If you set the trickle log flag in the cc_config.xml,

<trickle_debug>1</trickle_debug>

then the actual trickle initiation is printed in the event log.
[Jun 6, 2016 7:17:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ian Cantwell
Cruncher
Joined: Jul 19, 2013
Post Count: 15
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

I rechecked the result log and found that: NFO: Starting intermediate upload, index = 1 to 9, as far as I can see all trickles were reported. Comparing it with a valid unit I see no difference
My computers do sometimes spontaneously go to sleep but if this was an issue more of my units would fail
[Jun 6, 2016 1:25:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Greger
Cruncher
Joined: Aug 1, 2013
Post Count: 29
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

No info that network was required, found this after huge amount task getting invalid.

260 task lost

Learn the hard way for each project.
[Jun 25, 2016 9:35:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Caranthir
Cruncher
Joined: May 7, 2016
Post Count: 8
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All jobs are failing with Invalid

I also had lots of invalid results with old BOINC Client. My mistake was downloading the Boinc client from the WCG website ( https://secure.worldcommunitygrid.org/reg/ms/viewDownloadAgain.do ) which is a much older version (7.2.47) of BOINC. I got lots of invalid results with this client and wasted a lot of time. Then as i was looking for the reason for invalid results, I saw that there is a much newer version of BOINC (7.6.22) on the official website. ( https://boinc.berkeley.edu/download.php ) I started using the newer version and now all my results are valid.
[Jul 15, 2016 4:45:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 47   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread