Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 2370
Posts: 2370   Pages: 237   [ Previous Page | 222 223 224 225 226 227 228 229 230 231 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 958478 times and has 2369 replies Next Thread
Dataman
Ace Cruncher
Joined: Nov 16, 2004
Post Count: 4865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

How are WUs that error out handled? I got the following error WU:

Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
dg05_ d294_ pca002_ 6-- 640 Error 6/25/12 19:49:18 6/25/12 20:03:35 0.00 0.0 / 0.0
dg05_ d294_ pca002_ 5-- 640 Error 6/25/12 19:47:24 6/25/12 19:48:58 0.00 0.0 / 0.0 <- me
dg05_ d294_ pca002_ 4-- 640 Error 6/25/12 19:47:21 6/25/12 20:02:15 0.00 0.0 / 0.0
dg05_ d294_ pca002_ 3-- 640 Error 6/25/12 19:44:08 6/25/12 19:45:37 0.00 0.0 / 0.0
dg05_ d294_ pca002_ 2-- 640 Error 6/25/12 19:44:06 6/25/12 19:45:45 0.00 0.0 / 0.0
dg05_ d294_ pca002_ 0-- 640 Error 6/25/12 19:36:54 6/25/12 19:37:37 0.00 0.0 / 0.0
dg05_ d294_ pca002_ 1-- 640 Error 6/25/12 19:36:54 6/25/12 19:40:20 0.00 0.0 / 0.0

Shouldn't have this WU been caught after the third error? What is the policy about this type of WUs? Do they have to fail 7 times to be taken off the crunching list?

Thanks,
CJSL

The max error for this project is 7. Policy?
----------------------------------------


[Jun 25, 2012 9:31:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

Circulation stops after 5 or 6 *reported* errors I thought, meaning there will always be a 7th. The setting varies [but not 100%], the ZR is lower maybe, a little higher for this one as some clients do seem to succeed at tasks. So far had 2 that got rated invalid for no apparent reason and a few going south at 0:00:01

Looking at the computing time, the loss was bandwidth use only in this case. No indication given what the error log said... a -200 maybe?

--//--
[Jun 25, 2012 9:51:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Punchy
Advanced Cruncher
Texas
Joined: Nov 30, 2010
Post Count: 60
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

I have a WU that has been running for 38 hours and estimates another 52 hours to completion. The maximum I have seen for other DDDT2 work on this system is 4 hours. Do I let it continue?

dg05_c459_pca005_0
----------------------------------------

[Jun 25, 2012 10:35:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

Best suspend that task with LAIM off, then resume it manually or let it sit in "waiting to run" until an other task finishes. This method worked for me on e task that seemed to loop at 27% at 4.5hours CPU time. When resumed, the 75% finished in 3.5 hours.

--//--
[Jun 25, 2012 10:51:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

Just curious... was the 50,000 results left estimate all the remaining B + C work units times quorums, or was that disregarding any quorum multiplier?

Thanks!
[Jun 26, 2012 12:14:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

Shouldn't have this WU been caught after the third error? What is the policy about this type of WUs? Do they have to fail 7 times to be taken off the crunching list?

I apologize for not posting the error log... I wasn't interested in the error... I was more interested in why the WU had to error out so many times. Here it is (yes... error 200):

Result Log

Result Name: dg05_ d294_ pca002_ 5--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>dg05_d_g30_d291_to_d300_typeC_restr_defWCG0003GCW.str.gzb</file_name>
<error_code>-200</error_code>
</file_xfer_error>

</message>
]]>
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


[Jun 26, 2012 12:45:11 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

Just curious again:
Occasionally I get some tasks which finish right after the initial calculations were done (i.e. still with 0.0%). They upload their files, are reported as success and marked as 'PV'. But if the second task is reported usually both turn to 'Error' and new copies are sent (which again turn from PV to Error as soon as the next pair can be compared). Very rarely two repair WUs turn valid, but even they do not run beyond 0.0%.
So why do those tasks turn to Error instead of Invalid?
The result log does not report an error, so I guess they are invalid because of result comparison only. In this case I'd expect them to be Invalid.

Result Name: dg05_ c037_ pqa000_ 3--

<core_client_version>7.0.8</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
]]>
[Jun 26, 2012 6:32:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

It's a semantic thing to me. Server Aborted also log as error ATM. All zero computing time do and those failing without reaching 100%... which is the key. Think all that is not computing to an expected/correct end [and some computing time, even a fraction of a second] is correctly labelled but "error". Those that compute normal to the end have gone to invalid on my result status page for DDDT2... as noted twice already, with a clean log as you sampled.

--//--

P.S. [ot]7.0.8? How broken a [alpha] client does one want to continue use in production? 7.0.28 is the likely promoted version, though GPU crunchers have issue still and Minimum Buffer is still activating it's old High Priority Processing panic mode state when set to over half of the shortest deadline of cached work, even when connect is 24/7 (from 00:00 to 00:00)[/ot]
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 26, 2012 7:07:48 AM]
[Jun 26, 2012 7:06:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...


P.S. [ot]7.0.8? How broken a [alpha] client does one want to continue use in production? 7.0.28 is the likely promoted version, though GPU crunchers have issue still and Minimum Buffer is still activating it's old High Priority Processing panic mode state when set to over half of the shortest deadline of cached work, even when connect is 24/7 (from 00:00 to 00:00)[/ot]


Once upon a time I needed that version to run GPU WUs for Poem. Well, it worked, so why change a running system without need? I never liked to wait impatiently for the next promoted version to install it immediately, even if it replaces a beta version (usually by another beta version...). If any of my projects will need a newer version I'll have to install it anyway. ;-)
[Jun 26, 2012 7:54:28 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New thread for It's raining Dengue, Hallelujah!...

Just in, g41-g402 in my queue as final fetch for this run. Set the 0.00 buffer on both MinB & MaxAB. Need 62 days more to compute... have 68 days cached [~4.5 days crunch], to get me to blue. If it's not enough, then it will have to be next time [autumn maybe]. smile

With this g41-d402, approximately 9,000 left in quorum 2 to fetch. Comparing this to production, through last night, 70,656 have validated for this run... 222,000 of planned total minus 71,000 valid - 18,000 to circulate = 133,000 seeking completion on hosts or wingman to release PVal/Pver. The daily validations so far:

20th 0,940
21st 1,185
22nd 4,723
23rd 14,346
24th 20,988
25th 28,474 < still not exceeded 30K as daily top validations for this cycle.

End of progress estimates.

--//--
[Jun 26, 2012 8:39:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 2370   Pages: 237   [ Previous Page | 222 223 224 225 226 227 228 229 230 231 | Next Page ]
[ Jump to Last Post ]
Post new Thread