Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Discovering Dengue Drugs - Together - Phase 2 Forum Thread: New thread for It's raining Dengue, Hallelujah!... |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 2370
|
Author |
|
Dataman
Ace Cruncher Joined: Nov 16, 2004 Post Count: 4865 Status: Offline Project Badges: |
How are WUs that error out handled? I got the following error WU: Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit dg05_ d294_ pca002_ 6-- 640 Error 6/25/12 19:49:18 6/25/12 20:03:35 0.00 0.0 / 0.0 dg05_ d294_ pca002_ 5-- 640 Error 6/25/12 19:47:24 6/25/12 19:48:58 0.00 0.0 / 0.0 <- me dg05_ d294_ pca002_ 4-- 640 Error 6/25/12 19:47:21 6/25/12 20:02:15 0.00 0.0 / 0.0 dg05_ d294_ pca002_ 3-- 640 Error 6/25/12 19:44:08 6/25/12 19:45:37 0.00 0.0 / 0.0 dg05_ d294_ pca002_ 2-- 640 Error 6/25/12 19:44:06 6/25/12 19:45:45 0.00 0.0 / 0.0 dg05_ d294_ pca002_ 0-- 640 Error 6/25/12 19:36:54 6/25/12 19:37:37 0.00 0.0 / 0.0 dg05_ d294_ pca002_ 1-- 640 Error 6/25/12 19:36:54 6/25/12 19:40:20 0.00 0.0 / 0.0 Shouldn't have this WU been caught after the third error? What is the policy about this type of WUs? Do they have to fail 7 times to be taken off the crunching list? Thanks, CJSL The max error for this project is 7. Policy? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Circulation stops after 5 or 6 *reported* errors I thought, meaning there will always be a 7th. The setting varies [but not 100%], the ZR is lower maybe, a little higher for this one as some clients do seem to succeed at tasks. So far had 2 that got rated invalid for no apparent reason and a few going south at 0:00:01
Looking at the computing time, the loss was bandwidth use only in this case. No indication given what the error log said... a -200 maybe? --//-- |
||
|
Punchy
Advanced Cruncher Texas Joined: Nov 30, 2010 Post Count: 60 Status: Offline Project Badges: |
I have a WU that has been running for 38 hours and estimates another 52 hours to completion. The maximum I have seen for other DDDT2 work on this system is 4 hours. Do I let it continue?
----------------------------------------dg05_c459_pca005_0 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Best suspend that task with LAIM off, then resume it manually or let it sit in "waiting to run" until an other task finishes. This method worked for me on e task that seemed to loop at 27% at 4.5hours CPU time. When resumed, the 75% finished in 3.5 hours.
--//-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just curious... was the 50,000 results left estimate all the remaining B + C work units times quorums, or was that disregarding any quorum multiplier?
Thanks! |
||
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges: |
Shouldn't have this WU been caught after the third error? What is the policy about this type of WUs? Do they have to fail 7 times to be taken off the crunching list? I apologize for not posting the error log... I wasn't interested in the error... I was more interested in why the WU had to error out so many times. Here it is (yes... error 200): Result Log Result Name: dg05_ d294_ pca002_ 5-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>dg05_d_g30_d291_to_d300_typeC_restr_defWCG0003GCW.str.gzb</file_name> <error_code>-200</error_code> </file_xfer_error> </message> ]]> |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just curious again:
Occasionally I get some tasks which finish right after the initial calculations were done (i.e. still with 0.0%). They upload their files, are reported as success and marked as 'PV'. But if the second task is reported usually both turn to 'Error' and new copies are sent (which again turn from PV to Error as soon as the next pair can be compared). Very rarely two repair WUs turn valid, but even they do not run beyond 0.0%. So why do those tasks turn to Error instead of Invalid? The result log does not report an error, so I guess they are invalid because of result comparison only. In this case I'd expect them to be Invalid. Result Name: dg05_ c037_ pqa000_ 3-- <core_client_version>7.0.8</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> ]]> |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It's a semantic thing to me. Server Aborted also log as error ATM. All zero computing time do and those failing without reaching 100%... which is the key. Think all that is not computing to an expected/correct end [and some computing time, even a fraction of a second] is correctly labelled but "error". Those that compute normal to the end have gone to invalid on my result status page for DDDT2... as noted twice already, with a clean log as you sampled.
------------------------------------------//-- P.S. [ot]7.0.8? How broken a [alpha] client does one want to continue use in production? 7.0.28 is the likely promoted version, though GPU crunchers have issue still and Minimum Buffer is still activating it's old High Priority Processing panic mode state when set to over half of the shortest deadline of cached work, even when connect is 24/7 (from 00:00 to 00:00)[/ot] [Edit 1 times, last edit by Former Member at Jun 26, 2012 7:07:48 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
P.S. [ot]7.0.8? How broken a [alpha] client does one want to continue use in production? 7.0.28 is the likely promoted version, though GPU crunchers have issue still and Minimum Buffer is still activating it's old High Priority Processing panic mode state when set to over half of the shortest deadline of cached work, even when connect is 24/7 (from 00:00 to 00:00)[/ot] Once upon a time I needed that version to run GPU WUs for Poem. Well, it worked, so why change a running system without need? I never liked to wait impatiently for the next promoted version to install it immediately, even if it replaces a beta version (usually by another beta version...). If any of my projects will need a newer version I'll have to install it anyway. ;-) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just in, g41-g402 in my queue as final fetch for this run. Set the 0.00 buffer on both MinB & MaxAB. Need 62 days more to compute... have 68 days cached [~4.5 days crunch], to get me to blue. If it's not enough, then it will have to be next time [autumn maybe].
With this g41-d402, approximately 9,000 left in quorum 2 to fetch. Comparing this to production, through last night, 70,656 have validated for this run... 222,000 of planned total minus 71,000 valid - 18,000 to circulate = 133,000 seeking completion on hosts or wingman to release PVal/Pver. The daily validations so far: 20th 0,940 21st 1,185 22nd 4,723 23rd 14,346 24th 20,988 25th 28,474 < still not exceeded 30K as daily top validations for this cycle. End of progress estimates. --//-- |
||
|
|