Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 12
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2115 times and has 11 replies Next Thread
tavvva
Cruncher
Joined: Apr 11, 2011
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Status Invalid

Hello.

One of my tasks has been marked as Invalid and I'd like to investigate what happened.
Is there any chance to download the output data for comparison?

These are the Result Logs for the task:
----------------------------------------------------------------------------
Project Name: Discovering Dengue Drugs - Together - Phase 2
Created: 4/21/11
Name: dg02_b169_pda000
Minimum Quorum: 2
Replication: 3
----------------------------------------------------------------------------
Result Name App Version Number Status Sent Time Time Due /
 Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
dg02_ b169_ pda000_ 2-- 640 Valid 4/28/11 15:16:46 4/29/11 05:31:28 4.43 79.5 / 78.1
dg02_ b169_ pda000_ 1-- 640 Invalid 4/23/11 15:33:51 4/28/11 15:11:12 5.11 74.8 / 39.1
dg02_ b169_ pda000_ 0-- 640 Valid 4/23/11 15:33:40 4/24/11 11:12:19 6.19 51.9 / 78.1
----------------------------------------------------------------------------
Result Name: dg02_ b169_ pda000_ 0--

<core_client_version>6.6.41</core_client_version>
<![CDATA[
<stderr_txt>
Calling gridPlatform.init()
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
]]>

Status : Valid
----------------------------------------------------------------------------
Result Name: dg02_ b169_ pda000_ 1--

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Copying wcgrestart.rst
Copying wcgrestart.rst
called boinc_finish

</stderr_txt>
]]>

Status : Invalid
----------------------------------------------------------------------------
Result Name: dg02_ b169_ pda000_ 2--

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
]]>

Status : Valid
----------------------------------------------------------------------------
----------------------------------------
[Edit 1 times, last edit by tavvva at May 1, 2011 8:30:18 AM]
[May 1, 2011 8:28:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Status Invalid

Hi tavvva,

From you invalid log it seems the task was once restarted (exiting the client or booting the computer e.g.). Per Uplinger, WCG tech, these restarted tasks on DDDT2 have a slight difference which occurs during the resume/restart model rebuild phase which then on validation causes an intermediate ''inconclusive'' state which requires a 3rd copy to be computed so that can be determined which of the 2 original is the ''absolute'' correct copy.

One mitigation that helps, or rather 2 to reduce these to occur is, to hibernate a computer rather than switching off when not needed and to activate the 'Leave application in memory when suspended' in local preferences or on the website device profile.

--//--
[May 1, 2011 9:23:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
tavvva
Cruncher
Joined: Apr 11, 2011
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Status Invalid

Do I understand correctly, that it's caused by wrong DDDT2 application design?

My computer is now marked as unreliable, right? :'(
[May 1, 2011 1:11:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Status Invalid

Do I understand correctly, that it's caused by wrong DDDT2 application design?

Or wrong validator-design...

My computer is now marked as unreliable, right? :'(

Yes, one validation-error means it's unrealiable until enough other tasks has been validated.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[May 1, 2011 1:24:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Status Invalid

DDDT2 is one of the faster but less available projects, so it might be easier to get to the unreliable state, and harder to recover.
While you might have a few DDDT2 tasks Pending Validation that move to Valid, getting your system back to reliable, HCC is another fast project (quick task completion times) so crunching a few of those might expedite your systems move back to reliable, and then you will get a few more DDDT2 tasks.

One of many reasons to run several projects on each system.
[May 1, 2011 1:47:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Status Invalid

Do I understand correctly, that it's caused by wrong DDDT2 application design?

Or wrong validator-design...

My computer is now marked as unreliable, right? :'(

Yes, one validation-error means it's unrealiable until enough other tasks has been validated.

NOT [as you would know], since at WCG and probably anywhere else, one needs when full reliable more than 12 consecutive errors to fall out of the R++ class i.e. move from 0.1% to > 0.2% error rate. Any good result interspersed would restore the reliability partially.

Case in point, my quad had yesterday 12 error results due unstable wifi to include during suspended BOINC networking [A bug that never seems to go away angry) AND 15 plus manual aborts, yet it received not 12 hours later a ''repair'' job after about 15 good results.

--//--
[May 1, 2011 2:28:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Status Invalid

Do I understand correctly, that it's caused by wrong DDDT2 application design?

My computer is now marked as unreliable, right? :'(

It's not a problem of the application, just something that prevents to rebuild the model to the exact umpteenth decimal on resume. The validator could not be programmed to make these variations still as forming a valid quorum of 2 because as I noted ''absolute'' matching is required, though we might not fully appreciate that. What I'd think could be possible on the credit side (for which we're not here of course), is in these instances to give full, rather than half credit. Possibly the validator could be coded with that exception for DDDT2, but have doubt in that... depends how general the credit grant rules have been defined.

Given as what I noted in my post above, reliability is not immediately binned, rather after a dozen "consecutive" errors, it is not a concern for those who crunching consistently with an occasional hickup.

Hope it helps to get off the panic button

--//--
[May 1, 2011 3:07:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Status Invalid

NOT [as you would know], since at WCG and probably anywhere else, one needs when full reliable more than 12 consecutive errors to fall out of the R++ class i.e. move from 0.1% to > 0.2% error rate. Any good result interspersed would restore the reliability partially.

Well, for starters, host.error_rate is deprecated so other projects would use application.error_rate instead wink

But WCG is still using old server-code and therefore is probably still using host.error_rate, and possibly also the older += 0.05 instead of the more resent += 0.1 code. This latest change was http://boinc.berkeley.edu/trac/changeset/15771/trunk/boinc/sched/validator.C and this link shows both possible codes...

If assumes WCG is using the old code, the relevant code from validator.C is this:

host.error_rate *= 0.95;
if (!valid) {
host.error_rate += 0.05;
}

That does this code means in practice? let's show an example using your start-point with an error_rate at 0.1%. This means host.error_rate = 0.001 at the start.

If valid, your new error_rate becomes:
host.error_rate = host.error_rate * 0.95 = 0.001 * 0.95 = 0.00095, or new error_rate = 0.095 %.

In case of invalid, the code means your new error_rate becomes:
host.error_rate = host.error_rate * 0.95 = 0.001 * 0.95 = 0.00095 // the 1st. line, since no if-statement this will actually also be calculated
host.error_rate = host.error_rate + 0.05 = 0.00095 + 0.05 = 0.05095, or new error_rate = 5.095 %. // the 2nd. line.

As you can see, it's a huge difference between multiplying a number with 0.95, and increasing a number with 0.05. Multiplying with 0.95 means decreasing with 5%, while adding 0.05 does not mean adding 5%.

If you're going to increase with 5%, you would multiply with 1.05, not add 0.05.

Case in point, my quad had yesterday 12 error results due unstable wifi to include during suspended BOINC networking [A bug that never seems to go away angry) AND 15 plus manual aborts, yet it received not 12 hours later a ''repair'' job after about 15 good results.

You've overlooked one important fact, the reliability-rating is updated by the VALIDATOR (remember the code as linked above was in validator.C), but the validator only looks on tasks reported as "SUCCESS", and the validator doesn't know (and doesn't care) about all your tasks reported as either aborted or computational-errors. Granted this does also very temporary affect your reliability-rating, since a computer is only "reliable" if it's got max daily quota, but since the quota is doubled on a single "success"-report it's very easy to fulfill this requirement again.


So, if assumes the old code, after a single validation-error, to get below 0.2% again, you'll need... 19 validated results to become "reliable" again.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[May 1, 2011 5:01:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Status Invalid

That's fine, old/new/future code, as laid out, 27 technical errors and 15 odd valid returns (you get that when running Zero Redundancy tasks, none passing through inconclusive or pending validation state :P), [maybe a few validations to add from pre-error pending HCC validation], and a new repair job arrived after the 15th consecutive error free return. Can that be any louder a statement that not 19 ''valid' results are needed after each error, though the table in the FAQ's at > 5% indicates so? Can't possibly consider there to be a bug in the original code, as were all mass erroring was treated as a single demark ?

So snip:
Granted this does also very temporary affect your reliability-rating, since a computer is only "reliable" if it's got max daily quota, but since the quota is doubled on a single "success"-report it's very easy to fulfill this requirement again.

And that's what seemingly counted getting a short deadline job.

Now I wonder, were we discussing reliability or were we discussing reliability, which is what I think is what's important to tavvva.... to not be bounced off the DDDT2 repair feed ( guess - 3%-4% of original work), or was it the rare A type of which there will only be 36,000 out of 22.1 million and running like half the time of the B type, which have no special reliability rating requirement...

to each her/his own... an odd error, right or wrong, just is not a concern.

--//--
[May 1, 2011 6:27:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Status Invalid

Exacting when the Repair job was received, maybe someone can proof how this could happen so quickly with the rules Ingleside presented. There were 14 valids returned from the last error (marked roman I-XIV, and 4 stuck in PV before the repair Job assigned [which got autostarted, with help of a 4300 minute switch time and a > 2 day cache, at that time)

Quorum
E201992_ 608_ C.24.C19H12N4S.00058171.4.set1d06_ 2-- 640 Pending Validation 5/1/11 10:12:34 5/1/11 19:00:28 7.25 131.9 / 0.0 < Moi
E201992_ 608_ C.24.C19H12N4S.00058171.4.set1d06_ 1-- - In Progress 5/1/11 10:00:08 5/11/11 10:00:08 0.00 0.0 / 0.0
E201992_ 608_ C.24.C19H12N4S.00058171.4.set1d06_ 0-- 640 Error 5/1/11 09:45:09 5/1/11 09:46:30 0.00 0.0 / 0.0


Result Status pages list for device:
E201992_ 608_ C.24.C19H12N4S.00058171.4.set1d06_ 2-- 1479931 Pending Validation 5/1/11 10:12:34 5/1/11 19:00:28 7.25 131.9 / 0.0 < Repair Job
c4cw_ target03_ 108446601_ 0-- 1479931 Valid 4/28/11 15:51:43 5/1/11 18:56:36 4.58 83.3 / 84.4
c4cw_ target03_ 108447296_ 0-- 1479931 Valid 4/28/11 15:51:40 5/1/11 15:22:36 4.55 82.8 / 83.1
c4cw_ target03_ 108449001_ 0-- 1479931 Valid 4/28/11 15:51:31 5/1/11 15:22:36 4.57 83.1 / 83.1
E201965_ 857_ C.24.C18H10N2OS2Si.00572380.3.set1d06_ 1-- 1479931 Valid 4/28/11 15:51:37 5/1/11 13:44:12 6.78 123.3 / 149.0
c4cw_ target03_ 107558943_ 0-- 1479931 Valid 4/27/11 16:23:30 5/1/11 11:32:17 4.53 82.4 / 83.1
c4cw_ target03_ 107550755_ 0-- 1479931 Valid 4/27/11 16:22:48 5/1/11 10:12:34 4.50 81.9 / 83.9
c4cw_ target03_ 107564002_ 0-- 1479931 Valid 4/27/11 16:23:06 5/1/11 10:12:34 4.50 81.9 / 83.9

c4cw_ target03_ 107554809_ 0-- 1479931 Valid 4/27/11 16:22:48 5/1/11 07:03:27 4.57 83.2 / 82.9 (XIV)
c4cw_ target03_ 107553347_ 0-- 1479931 Valid 4/27/11 16:22:48 5/1/11 07:03:08 4.56 82.9 / 82.9 (XIII)
E201981_ 610_ C.24.C19H11N3S2.00774165.2.set1d06_ 0-- 1479931 Pending Validation 4/30/11 06:42:32 5/1/11 05:12:14 6.58 119.7 / 0.0
E201982_ 205_ C.24.C19H11N3S2.00307157.1.set1d06_ 0-- 1479931 Pending Validation 4/30/11 06:42:32 5/1/11 05:09:33 6.62 120.4 / 0.0
c4cw_ target03_ 107556883_ 0-- 1479931 Valid 4/27/11 16:22:48 5/1/11 01:43:42 4.52 82.2 / 83.6 (XII)
c4cw_ target03_ 107558846_ 0-- 1479931 Valid 4/27/11 16:22:48 5/1/11 01:25:12 4.53 82.3 / 83.8 (XI)
c4cw_ target03_ 107562680_ 0-- 1479931 Valid 4/27/11 16:22:48 4/30/11 22:07:38 4.49 81.6 / 83.6 (X)
c4cw_ target03_ 107563443_ 0-- 1479931 Valid 4/27/11 16:22:48 4/30/11 22:07:38 4.48 81.5 / 83.6 (IX)
c4cw_ target03_ 107561522_ 0-- 1479931 Valid 4/27/11 16:22:48 4/30/11 22:07:15 4.49 81.6 / 83.5 (VIII)
c4cw_ target03_ 107559874_ 0-- 1479931 Valid 4/27/11 16:22:48 4/30/11 22:07:15 4.49 81.6 / 83.5 (VII)
c4cw_ target03_ 107563959_ 0-- 1479931 Valid 4/27/11 16:22:48 4/30/11 17:34:09 4.55 82.7 / 84.1 (VI)
c4cw_ target03_ 107568068_ 0-- 1479931 Valid 4/27/11 16:22:48 4/30/11 17:33:44 4.53 82.4 / 84.1 (V)
X0000067411081200604132106_ 1-- 1479931 Pending Validation 4/30/11 05:52:25 4/30/11 16:27:14 1.08 19.7 / 0.0
X0000067411047200604132107_ 0-- 1479931 Valid 4/30/11 05:52:25 4/30/11 16:26:47 1.09 19.9 / 19.7 (IV)
X0000067411127200604132106_ 0-- 1479931 Pending Validation 4/30/11 05:52:25 4/30/11 16:26:47 1.08 19.6 / 0.0
X0000067411118200604132105_ 0-- 1479931 Valid 4/30/11 05:52:25 4/30/11 16:26:16 1.04 18.8 / 27.7 (III)
E201966_ 389_ C.24.C17H8N4OSSe.00566196.2.set1d06_ 0-- 1479931 Valid 4/28/11 15:52:11 4/30/11 15:25:50 6.95 126.4 / 124.0 (II)
E201966_ 446_ C.24.C17H8N4OSSe.00539718.3.set1d06_ 0-- 1479931 Valid 4/28/11 15:52:12 4/30/11 15:25:15 6.67 121.3 / 121.9 (I)
E201966_ 426_ C.23.C17H10N2OSSeSi.00560402.4.set1d06_ 1-- 1479931 Error 4/28/11 15:51:26 4/30/11 06:42:06 0.51 9.3 / 0.0
c4cw_ target03_ 107551457_ 0-- 1479931 Error 4/27/11 16:22:25 4/30/11 06:42:06 0.37 6.7 / 0.0
c4cw_ target03_ 107561004_ 0-- 1479931 Error 4/27/11 16:22:24 4/30/11 06:42:06 2.91 53.0 / 0.0
c4cw_ target03_ 107557954_ 0-- 1479931 Error 4/27/11 16:22:25 4/30/11 06:42:06 2.38 43.2 / 0.0
c4cw_ target03_ 107543536_ 0-- 1479931 Error 4/27/11 16:22:25 4/30/11 06:42:06 0.00 0.0 / 0.0
E201966_ 205_ C.24.C17H10N4S2Si.00706781.0.set1d06_ 1-- 1479931 Error 4/28/11 15:51:34 4/30/11 06:42:06 0.52 9.4 / 0.0
X0000058570539200509291621_ 0-- 1479931 Valid 4/29/11 09:18:47 4/29/11 13:26:58 1.24 22.8 / 26.8
c4cw_ target03_ 107561209_ 0-- 1479931 Error 4/27/11 16:22:03 4/29/11 13:26:34 0.25 4.6 / 0.0
c4cw_ target03_ 107561472_ 0-- 1479931 Error 4/27/11 16:22:03 4/29/11 13:26:34 0.74 13.6 / 0.0
X0000058570522200509291622_ 0-- 1479931 User Aborted 4/29/11 09:18:47 4/29/11 09:20:37 0.00 0.0 / 0.0
dg02_ c081_ pca000_ 0-- 1479931 Valid 4/27/11 16:22:24 4/29/11 05:17:04 4.64 85.2 / 77.0
X0000066820218200604121626_ 0-- 1479931 Error 4/28/11 13:40:00 4/28/11 17:37:57 0.24 4.4 / 0.0
c4cw_ target03_ 107549649_ 0-- 1479931 Error 4/27/11 16:21:44 4/28/11 17:37:57 3.46 63.5 / 0.0
X0000066840459200603291345_ 0-- 1479931 User Aborted 4/28/11 15:52:16 4/28/11 16:08:52 0.00 0.0 / 0.0
X0000066841430200603231223_ 0-- 1479931 User Aborted 4/28/11 15:53:40 4/28/11 16:08:08 0.00 0.0 / 0.0
X0000066840075200603291351_ 1-- 1479931 User Aborted 4/28/11 15:53:41 4/28/11 16:08:08 0.00 0.0 / 0.0
X0000066840270200603291348_ 1-- 1479931 User Aborted 4/28/11 15:53:40 4/28/11 16:08:08 0.00 0.0 / 0.0
X0000066841169200603231227_ 1-- 1479931 User Aborted 4/28/11 15:53:40 4/28/11 16:08:08 0.00 0.0 / 0.0
X0000066820289200604121625_ 1-- 1479931 User Aborted 4/28/11 13:40:27 4/28/11 13:42:13 0.00 0.0 / 0.0
X0000066820167200604121627_ 0-- 1479931 User Aborted 4/28/11 13:40:00 4/28/11 13:42:13 0.00 0.0 / 0.0
X0000066820253200604121625_ 1-- 1479931 User Aborted 4/28/11 13:40:28 4/28/11 13:42:13 0.00 0.0 / 0.0

NB, who knows do Linux clients have favorable rules, given they do not constitute more than probably 12-13% of the total WCG contributing devices, in credit terms.
[May 1, 2011 7:48:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread