Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 37
Posts: 37   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3733 times and has 36 replies Next Thread
GB033533
Senior Cruncher
UK
Joined: Dec 8, 2004
Post Count: 198
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Why did wu error?

I got sent a wu because the original wingman had not replied. It seemed to process normally, with no interrupts, and my claim for the 6 hours was normal. But the validator said 'error', I got no credit, and the wu went out to two other crunchers, who successfully processed it.

So why did mine (and the first cruncher) go into error?

We all have the same msg in the result log;
"<core_client_version>6.2.28</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Finishing early because max runtime has been exceeded.21606.406250
called boinc_finish"
though the guy at the top of the list has core client v6.10.29

CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 3-- 614 Valid 04/02/10 18:20:45 05/02/10 22:22:28 6.00 133.0 / 108.6
CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 4-- 614 Valid 04/02/10 18:18:45 05/02/10 05:46:44 6.00 106.6 / 137.6
CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 2-- 614 Error 04/02/10 04:43:43 04/02/10 18:09:32 6.00 94.9 / 0.0 <-- mine
CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 0-- 614 Error 25/01/10 04:24:28 01/02/10 01:16:13 6.00 46.4 / 0.0
CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 1-- 614 Error 25/01/10 04:20:05 07/02/10 04:50:54 0.00 0.0 / 0.0

Thanks
----------------------------------------

[Feb 8, 2010 9:15:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

More curious, why did the No Reply turn Error 2.25 days after the quorum was technically complete, long after the last 2 returned / validated.

This requires a tech review... so stand by for when s/he looks in.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 8, 2010 9:22:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

Got a few errorcodeless errors where the wingmen also failed. Both seemingly having run through to the end, variety of clients. W7 for mine, unknown for wingmen. (would be nice to see in log with CPU info)

1)

CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 3-- - In Progress 25-2-10 02:17:26 1-3-10 02:17:26 0.00 0.0 / 0.0
CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 2-- - In Progress 25-2-10 02:16:03 1-3-10 02:16:03 0.00 0.0 / 0.0
CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 0-- 614 Error 22-2-10 13:42:46 25-2-10 02:10:24 3.56 65.4 / 0.0 < moi
CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 1-- 614 Error 22-2-10 13:42:15 23-2-10 20:50:19 5.98 62.1 / 0.0

2)

CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 3-- - In Progress 25-2-10 02:16:04 1-3-10 02:16:04 0.00 0.0 / 0.0
CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 2-- - In Progress 25-2-10 02:15:11 1-3-10 02:15:11 0.00 0.0 / 0.0
CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 1-- 614 Error 21-2-10 00:19:09 24-2-10 19:33:46 3.68 118.7 / 0.0 < moi
CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 0-- 614 Error 21-2-10 00:16:47 25-2-10 02:12:24 2.91 63.7 / 0.0

Copy of all identical logs:
Result Name: CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 1--
<core_client_version>6.10.34</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
]]>

----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 25, 2010 7:39:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
GB033533
Senior Cruncher
UK
Joined: Dec 8, 2004
Post Count: 198
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Why did wu error?

Sek, sorry to see you've also had errors. Now I've had three more end in error, once the wingmen had returned their results;

CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 3-- 614 Pending Validation 2/25/10 01:43:27 2/25/10 06:56:14 3.03 55.3 / 0.0
CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 2-- - In Progress 2/25/10 01:43:26 3/1/10 01:43:26 0.00 0.0 / 0.0
CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 1-- 614 Error 2/24/10 07:02:41 2/25/10 01:36:57 3.10 58.4 / 0.0
CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 0-- 614 Error 2/24/10 07:01:10 2/24/10 21:19:19 3.64 56.7 / 0.0 <-- mine

CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 2-- - In Progress 2/25/10 01:26:58 3/1/10 01:26:58 0.00 0.0 / 0.0
CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 3-- - In Progress 2/25/10 01:25:11 3/1/10 01:25:11 0.00 0.0 / 0.0
CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 0-- 614 Error 2/23/10 19:55:14 2/25/10 01:19:53 4.04 49.9 / 0.0
CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 1-- 614 Error 2/23/10 19:52:29 2/24/10 07:01:10 3.07 47.7 / 0.0 <-- mine

CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 2-- - In Progress 2/25/10 03:10:45 3/1/10 03:10:45 0.00 0.0 / 0.0
CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 3-- - In Progress 2/25/10 03:09:24 3/1/10 03:09:24 0.00 0.0 / 0.0
CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 0-- 614 Error 2/23/10 05:25:23 2/25/10 03:05:13 7.15 93.8 / 0.0
CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 1-- 614 Error 2/23/10 05:18:41 2/23/10 22:11:06 6.81 106.0 / 0.0 <-- mine

Prior to the first one I mentioned, I had never had an error. And the only errors I saw were where there was zero, or almost zero runtime from wingmen. But these all appear to have successfully run to completion.

What am I doing wrong all of a sudden? For a minor cruncher like me, it's a lot of lost time and effort....
----------------------------------------

[Feb 25, 2010 8:18:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

Commonality here batches 349/350-MYH3, but also a parent. Got a whole series of 349 children and some completing shortly, but since these errors had 349's+348's passing and validating fine. What I do know almost for sure is that the first of the quorum gets marked normal in PV, only the validation process turning them then in error when both have returned. I'd have noticed, plus the BOINCTasks result history marks status 0K completion

World Community Grid 6.14 Help Cure Muscular Dystrophy - Phase 2 CMD2_0349-MYH3.clustersOccur-1WUU_D.clustersOccur_216_593103_593711 03:44:41 (03:40:48) 24-02-2010 20:33 24-02-2010 20:33 Reported: Ok

Anyway, making a dump of the current HCMD2 PV results to see if the observation holds...

edit:... should more errors develop... then it could be something in the validation process.

edit2: The near simultaneous extra 2 copies transmission indicates it occurring during validation.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at Feb 25, 2010 9:11:07 AM]
[Feb 25, 2010 8:37:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

Seems to be some rogue batches
Have had six error over night some without error codes also and some with;

Result Log

Result Name: CMD2_ 0350-MYH3.clustersOccur-3CWB_ P.clustersOccur_ 150_ 0--



<core_client_version>6.10.18</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Finishing early because max runtime has been exceeded.21613.564148
called boinc_finish

The above from a wu only lasting 6 hours.
It appears somewhat random the first from23 feb 14.04 the last 24 feb
20.35.
Also noticed,have received work units from this project and also from faah over the last couple of days with short 4 day return times for no reason.eg no error ,out of time inconclusive etc, just normal workunits.
So maybe a validator problem .
Chris.
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 25, 2010 8:47:35 AM]
[Feb 25, 2010 8:46:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

This is normal:

Finishing early because max runtime has been exceeded.21613.564148

and is the standard 6 hour cut off line when 60% of the positions was not completed.

4 day return times are repair jobs [normal is 10], your computer deemed reliable. Could be repairs tasks for what it's already going now for half a day or so.

My last 15, 3 in PV:

CMD2_ 0349-MYH3.clustersOccur-2HJH_ A.clustersOccur_ 322_ 0-- 1112084 Valid
CMD2_ 0349-MYH3.clustersOccur-1KLO_ A.clustersOccur_ 168_ 967971_ 969286_ 0-- 1112084 Pending Validation
CMD2_ 0348-MYH3.clustersOccur-2O72_ A.clustersOccur_ 109_ 394893_ 395885_ 1-- 1112084 Pending Validation
CMD2_ 0349-MYH3.clustersOccur-2QOV_ 3.clustersOccur_ 659_ 1-- 1112084 Valid
CMD2_ 0349-MYH3.clustersOccur-3CMQ_ A.clustersOccur_ 38_ 106887_ 107575_ 0-- 1112084 Valid
CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 0-- 1112084 Error
CMD2_ 0348-MYH3.clustersOccur-1M6B_ A.clustersOccur_ 192_ 445174_ 445636_ 0-- 1112084 Valid
CMD2_ 0349-MYH3.clustersOccur-2QOV_ K.clustersOccur_ 76_ 1-- 1112084 Pending Validation
CMD2_ 0348-MYH3.clustersOccur-3D9T_ A.clustersOccur_ 17_ 134794_ 136732_ 0-- 1112084 Valid
CMD2_ 0348-MYH3.clustersOccur-2JDQ_ A.clustersOccur_ 200_ 448095_ 448832_ 0-- 1112084 Valid
CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 1-- 1112084 Error
CMD2_ 0328-1433GA.clustersOccur-1BY1_ A.clustersOccur_ 5_ 119367_ 121961_ 120615_ 121961_ 0-- 1112084 Valid
CMD2_ 0349-MYH3.clustersOccur-2DAT_ A.clustersOccur_ 94_ 0-- 1112084 Valid
CMD2_ 0348-MYH3.clustersOccur-1M6B_ A.clustersOccur_ 327_ 756284_ 756818_ 0-- 1112084 Valid
CMD2_ 0348-MYH3.clustersOccur-2QNK_ A.clustersOccur_ 70_ 0-- 1112084 Valid
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at Feb 25, 2010 8:59:26 AM]
[Feb 25, 2010 8:55:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

Hi Sekerob, not normal these were not repair jobs as stated in my post just normal wus.
Singles on faah and doubles here.
Sent with 4 day deadlines .To my knowledge that is not the norm.
Cheers
Chris.
[Feb 25, 2010 9:07:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

That's what I said. Extra Make Up/Repair/No Reply copies are send out with 4 day deadline, ONLY to known reliable devices with known under 4 day return times.

Make Ups are sometimes send out to push the completion of a batch or to test a new batch... well there you have a possible answer. Certainly for faah the techs do that every new batch, let's call them Reconnaissance tasks :D

edit: those FAAH test tasks are really for the purpose to discover if they've been sized to run at fairly average run time, not so much to determine if there are errors, wrong parms e.g.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 4 times, last edit by Sekerob at Feb 25, 2010 9:22:19 AM]
[Feb 25, 2010 9:12:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Why did wu error?

Thanks for that.
But for the cruncher we have no way of knowing.


Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
CMD2_ 0349-MYH3.clustersOccur-2QOV_ C.clustersOccur_ 182_ 504572_ 505628_ 504715_ 504943_ 504841_ 504943_ 1-- 614 Pending Validation 24/02/10 20:08:45 25/02/10 03:28:22 1.29 27.0 / 0.0
CMD2_ 0349-MYH3.clustersOccur-2QOV_ C.clustersOccur_ 182_ 504572_ 505628_ 504715_ 504943_ 504841_ 504943_ 0-- - In Progress 24/02/10 20:08:41 28/02/10 20:08:41 0.00 0.0 / 0.0

Never noticed this befor a nd have had a lot lately.
Also all but 1 faah unit returned recently has been marked inconclusive.
So still think the validator is suspect.
My rac is getting screwed sad
Chris
[Feb 25, 2010 9:23:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 37   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread