Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: FightAIDS@Home Thread: Large Number of units marked invalid |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 14
|
Author |
|
slakin
Advanced Cruncher Joined: Jul 4, 2008 Post Count: 79 Status: Offline Project Badges: |
On what is normally a reliable machine, I have had 72 of the new FAHV units go invalid. Strange thing is when I look at the log, I do not see any error. Here is one example:
Result Log Result Name: FAHV_ 1000016_ 3j3y-1R-P1_ 16060_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> INFO: result number = 0 INFO: No state to restore. Start from the beginning. [19:32:09] Number of tasks = 4 [19:32:09] Running task 0,CPU time at start of task 0 was 0.000000 [19:32:09] ./ZINC95375478.pdbqt size = 26 7 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0 [19:34:22] Finished task #0 cpu time used 127.078415 [19:34:22] Running task 1,CPU time at start of task 1 was 127.078415 [19:34:22] ./ZINC01112503.pdbqt size = 32 6 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0 [19:37:20] Finished task #1 cpu time used 168.371879 [19:37:20] Running task 2,CPU time at start of task 2 was 295.450294 [19:37:20] ./ZINC72458436.pdbqt size = 28 7 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0 [19:39:35] Finished task #2 cpu time used 128.186022 [19:39:35] Running task 3,CPU time at start of task 3 was 423.636316 [19:39:35] ./ZINC11910731.pdbqt size = 33 4 ../../projects/www.worldcommunitygrid.org/fahv.3j3y-1R-P1.pdbqt size = 4386 0 [19:41:57] Finished task #3 cpu time used 134.816064 19:41:57 (4980): called boinc_finish(0) </stderr_txt> ]]> Let me know what additional data you might want to look into this. Thanks |
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 447 Status: Offline Project Badges: |
Ditto.
|
||
|
Seoulpowergrid
Veteran Cruncher Joined: Apr 12, 2013 Post Count: 815 Status: Offline Project Badges: |
I've got them spread over four machines, all windows boxes. Most of the invalids I've got were invalid for me and one other user before a valid was produced. Run times were basically the same for me, other machines giving invalid returns, and the machines giving valid results.
----------------------------------------EDIT: FAHV_1000017_3j3y-1R-P1_16167_ 2-- FAHV_ 1000013_ 3j3y-1R-P1_ 1455_ 2-- FAHV_ 1000017_ 3j3y-1R-P1_ 21698_ 3-- FAHV_ 1000010_ 4xfx_ P4_ Rigid_ 7590_ 1-- FAHV_ 1000014_ 3j3y-1R-P1_ 1324_ 1-- FAHV_ 1000020_ 3j3y-1R-P1_ 6259_ 1-- [Edit 1 times, last edit by Seoulpowergrid at Dec 6, 2016 2:13:46 AM] |
||
|
Steve1979
Cruncher Joined: Nov 22, 2004 Post Count: 32 Status: Offline Project Badges: |
I also have an invalid result - with 3 invalids before mine. The result after mine was valid. All results were granted the full Boinc credit (except the error).
----------------------------------------Steve. Result Name OS type OS version App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 5-- Microsoft Windows 7 Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) 734 Valid 06/12/16 09:30:06 06/12/16 10:16:38 0.08 3.8 / 3.8 FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 4-- Microsoft Windows 7 Enterprise x64 Edition, Service Pack 1, (06.01.7601.00) 734 Invalid 05/12/16 16:22:42 06/12/16 09:29:49 0.07 2.0 / 2.0 FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 3-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 734 Error 05/12/16 16:20:36 05/12/16 16:22:40 0.00 7.9 / 0.0 FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.14393.00) 734 Invalid 04/12/16 23:27:22 05/12/16 16:20:31 0.10 2.8 / 2.8 FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 1-- Microsoft Windows 7 x64 Edition, Service Pack 1, (06.01.7601.00) 734 Invalid 04/12/16 21:35:42 04/12/16 23:27:11 0.06 1.8 / 1.8 FAHV_ 1000014_ 3j3y-1R-P1_ 20365_ 0-- Microsoft Windows 10 Core x64 Edition, (10.00.14393.00) 734 Invalid 04/12/16 13:23:25 04/12/16 21:35:09 0.06 1.2 / 1.2 |
||
|
RetiredTech
Advanced Cruncher Canada Joined: Feb 2, 2012 Post Count: 91 Status: Offline Project Badges: |
I too have a growing list of invalid results. Have rebooted and the invalid list continues to grow. Project reset?
|
||
|
supdood
Senior Cruncher USA Joined: Aug 6, 2015 Post Count: 333 Status: Offline Project Badges: |
Same here, and from devices that almost never go invalid. Also had one machine that had never had an error on WCG have nothing but errors ("maximum elapsed time exceeded" even though they were only 4 hours of CPU time). Moved that one back to Zika.
----------------------------------------Hopefully these issues don't persist throughout the re-launch. |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
Greetings all,
There was a bug in the validator code. I have corrected it. You will know that it was just marking results as invalid improperly. Results that were marked valid were in fact valid and sent for packaging. We are sorry for results being marked invalid improperly. Thanks, -Uplinger |
||
|
slakin
Advanced Cruncher Joined: Jul 4, 2008 Post Count: 79 Status: Offline Project Badges: |
Thanks for the update Keith. Since I have almost 300 invalids now, was curious if this would impact my machine reliability rating.
|
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: |
Unfortunately, it will for a short period of time. Once your computer gets above the reliable threshold it will start allowing your computer to download FAHV workunits without issue. The reliable rating is by project, so the reliability is only for FAHV application that will be affected. I think if you return something like 20 valid results in a row, you can go from 0.0 to 1.0. With the shortness of these workunits, I think most machines would be back to reliable within 24 hours easily.
Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
May I suggest you take time to get proper estimated TTC on the different task types for the various projects, flex and rigid in this FAHV case. Now got 280 'Rigid' on the octo in queue with 4:02 minutes estimate, with of course a locked down DCF of 1.0, but they run 2+ hours. Just tell us if just to let them stew till they expire.
Knowing how to size a task has not solved the problem in the past and wont in the future... it's root cause addressing that's needed, smarten up the feeder/distributor assembly to know not to slap the same current median on newly distributed tasks so SNAFU's wont impact volunteer clients. |
||
|
|