Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 14
Posts: 14   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2512 times and has 13 replies Next Thread
slakin
Advanced Cruncher
Joined: Jul 4, 2008
Post Count: 79
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Large Number of units marked invalid

On what is normally a reliable machine, I have had 72 of the new FAHV units go invalid. Strange thing is when I look at the log, I do not see any error. Here is one example:


Result Log

Result Name: FAHV_ 1000016_ 3j3y-1R-P1_ 16060_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
INFO: result number = 0
INFO: No state to restore. Start from the beginning.
[19:32:09] Number of tasks = 4
[19:32:09] Running task 0,CPU time at start of task 0 was 0.000000
[19:32:09] ./ZINC95375478.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0
[19:34:22] Finished task #0 cpu time used 127.078415
[19:34:22] Running task 1,CPU time at start of task 1 was 127.078415
[19:34:22] ./ZINC01112503.pdbqt size = 32 6 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0
[19:37:20] Finished task #1 cpu time used 168.371879
[19:37:20] Running task 2,CPU time at start of task 2 was 295.450294
[19:37:20] ./ZINC72458436.pdbqt size = 28 7 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0
[19:39:35] Finished task #2 cpu time used 128.186022
[19:39:35] Running task 3,CPU time at start of task 3 was 423.636316
[19:39:35] ./ZINC11910731.pdbqt size = 33 4 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0
[19:41:57] Finished task #3 cpu time used 134.816064
19:41:57 (4980): called boinc_finish(0)

</stderr_txt>
]]>

Let me know what additional data you might want to look into this. Thanks
[Dec 5, 2016 8:36:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 447
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

Ditto.
[Dec 6, 2016 12:39:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 815
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

I've got them spread over four machines, all windows boxes. Most of the invalids I've got were invalid for me and one other user before a valid was produced. Run times were basically the same for me, other machines giving invalid returns, and the machines giving valid results.

EDIT:
FAHV_1000017_3j3y-1R-P1_16167_ 2--
FAHV_ 1000013_ 3j3y-1R-P1_ 1455_ 2--
FAHV_ 1000017_ 3j3y-1R-P1_ 21698_ 3--
FAHV_ 1000010_ 4xfx_ P4_ Rigid_ 7590_ 1--
FAHV_ 1000014_ 3j3y-1R-P1_ 1324_ 1--
FAHV_ 1000020_ 3j3y-1R-P1_ 6259_ 1--
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Seoulpowergrid at Dec 6, 2016 2:13:46 AM]
[Dec 6, 2016 2:11:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Steve1979
Cruncher
Joined: Nov 22, 2004
Post Count: 32
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

I also have an invalid result - with 3 invalids before mine. The result after mine was valid. All results were granted the full Boinc credit (except the error).

Steve.

Result Name OS type OS version App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 5-- Microsoft Windows 7 Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) 734 Valid 06/12/16 09:30:06 06/12/16 10:16:38 0.08 3.8 / 3.8
FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 4-- Microsoft Windows 7 Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) 734 Invalid 05/12/16 16:22:42 06/12/16 09:29:49 0.07 2.0 / 2.0
FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 3-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 734 Error 05/12/16 16:20:36 05/12/16 16:22:40 0.00 7.9 / 0.0
FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.14393.00) 734 Invalid 04/12/16 23:27:22 05/12/16 16:20:31 0.10 2.8 / 2.8
FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 1-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 734 Invalid 04/12/16 21:35:42 04/12/16 23:27:11 0.06 1.8 / 1.8
FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.14393.00) 734 Invalid 04/12/16 13:23:25 04/12/16 21:35:09 0.06 1.2 / 1.2
----------------------------------------

[Dec 6, 2016 11:07:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
RetiredTech
Advanced Cruncher
Canada
Joined: Feb 2, 2012
Post Count: 91
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

I too have a growing list of invalid results. Have rebooted and the invalid list continues to grow. Project reset?
[Dec 6, 2016 11:59:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
supdood
Senior Cruncher
USA
Joined: Aug 6, 2015
Post Count: 333
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

Same here, and from devices that almost never go invalid. Also had one machine that had never had an error on WCG have nothing but errors ("maximum elapsed time exceeded" even though they were only 4 hours of CPU time). Moved that one back to Zika.

Hopefully these issues don't persist throughout the re-launch.
----------------------------------------
Crunch with BOINC team USA
www.boincusa.com

[Dec 6, 2016 1:33:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

Greetings all,

There was a bug in the validator code. I have corrected it. You will know that it was just marking results as invalid improperly. Results that were marked valid were in fact valid and sent for packaging. We are sorry for results being marked invalid improperly.

Thanks,
-Uplinger
[Dec 7, 2016 12:23:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
slakin
Advanced Cruncher
Joined: Jul 4, 2008
Post Count: 79
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

Thanks for the update Keith. Since I have almost 300 invalids now, was curious if this would impact my machine reliability rating.
[Dec 7, 2016 1:21:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

Unfortunately, it will for a short period of time. Once your computer gets above the reliable threshold it will start allowing your computer to download FAHV workunits without issue. The reliable rating is by project, so the reliability is only for FAHV application that will be affected. I think if you return something like 20 valid results in a row, you can go from 0.0 to 1.0. With the shortness of these workunits, I think most machines would be back to reliable within 24 hours easily.

Thanks,
-Uplinger
[Dec 7, 2016 1:43:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Large Number of units marked invalid

May I suggest you take time to get proper estimated TTC on the different task types for the various projects, flex and rigid in this FAHV case. Now got 280 'Rigid' on the octo in queue with 4:02 minutes estimate, with of course a locked down DCF of 1.0, but they run 2+ hours. Just tell us if just to let them stew till they expire.

Knowing how to size a task has not solved the problem in the past and wont in the future... it's root cause addressing that's needed, smarten up the feeder/distributor assembly to know not to slap the same current median on newly distributed tasks so SNAFU's wont impact volunteer clients.
[Dec 7, 2016 11:54:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 14   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread