Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Help Cure Muscular Dystrophy - Phase 2 Forum Thread: Why did wu error? |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 37
|
Author |
|
GB033533
Senior Cruncher UK Joined: Dec 8, 2004 Post Count: 198 Status: Offline Project Badges: |
I got sent a wu because the original wingman had not replied. It seemed to process normally, with no interrupts, and my claim for the 6 hours was normal. But the validator said 'error', I got no credit, and the wu went out to two other crunchers, who successfully processed it.
----------------------------------------So why did mine (and the first cruncher) go into error? We all have the same msg in the result log; "<core_client_version>6.2.28</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. Finishing early because max runtime has been exceeded.21606.406250 called boinc_finish" though the guy at the top of the list has core client v6.10.29 CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 3-- 614 Valid 04/02/10 18:20:45 05/02/10 22:22:28 6.00 133.0 / 108.6 CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 4-- 614 Valid 04/02/10 18:18:45 05/02/10 05:46:44 6.00 106.6 / 137.6 CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 2-- 614 Error 04/02/10 04:43:43 04/02/10 18:09:32 6.00 94.9 / 0.0 <-- mine CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 0-- 614 Error 25/01/10 04:24:28 01/02/10 01:16:13 6.00 46.4 / 0.0 CMD2_ 0314-MYH14.clustersOccur-2QOU_ M.clustersOccur_ 44_ 1-- 614 Error 25/01/10 04:20:05 07/02/10 04:50:54 0.00 0.0 / 0.0 Thanks |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
More curious, why did the No Reply turn Error 2.25 days after the quorum was technically complete, long after the last 2 returned / validated.
----------------------------------------This requires a tech review... so stand by for when s/he looks in.
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Got a few errorcodeless errors where the wingmen also failed. Both seemingly having run through to the end, variety of clients. W7 for mine, unknown for wingmen. (would be nice to see in log with CPU info)
----------------------------------------1) CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 3-- - In Progress 25-2-10 02:17:26 1-3-10 02:17:26 0.00 0.0 / 0.0 CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 2-- - In Progress 25-2-10 02:16:03 1-3-10 02:16:03 0.00 0.0 / 0.0 CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 0-- 614 Error 22-2-10 13:42:46 25-2-10 02:10:24 3.56 65.4 / 0.0 < moi CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 1-- 614 Error 22-2-10 13:42:15 23-2-10 20:50:19 5.98 62.1 / 0.0 2) CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 3-- - In Progress 25-2-10 02:16:04 1-3-10 02:16:04 0.00 0.0 / 0.0 CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 2-- - In Progress 25-2-10 02:15:11 1-3-10 02:15:11 0.00 0.0 / 0.0 CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 1-- 614 Error 21-2-10 00:19:09 24-2-10 19:33:46 3.68 118.7 / 0.0 < moi CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 0-- 614 Error 21-2-10 00:16:47 25-2-10 02:12:24 2.91 63.7 / 0.0 Copy of all identical logs: Result Name: CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 1--
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
GB033533
Senior Cruncher UK Joined: Dec 8, 2004 Post Count: 198 Status: Offline Project Badges: |
Sek, sorry to see you've also had errors. Now I've had three more end in error, once the wingmen had returned their results;
----------------------------------------CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 3-- 614 Pending Validation 2/25/10 01:43:27 2/25/10 06:56:14 3.03 55.3 / 0.0 CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 2-- - In Progress 2/25/10 01:43:26 3/1/10 01:43:26 0.00 0.0 / 0.0 CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 1-- 614 Error 2/24/10 07:02:41 2/25/10 01:36:57 3.10 58.4 / 0.0 CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 381_ 456662_ 457253_ 0-- 614 Error 2/24/10 07:01:10 2/24/10 21:19:19 3.64 56.7 / 0.0 <-- mine CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 2-- - In Progress 2/25/10 01:26:58 3/1/10 01:26:58 0.00 0.0 / 0.0 CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 3-- - In Progress 2/25/10 01:25:11 3/1/10 01:25:11 0.00 0.0 / 0.0 CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 0-- 614 Error 2/23/10 19:55:14 2/25/10 01:19:53 4.04 49.9 / 0.0 CMD2_ 0349-MYH3.clustersOccur-2QOU_ F.clustersOccur_ 15_ 99605_ 101215_ 1-- 614 Error 2/23/10 19:52:29 2/24/10 07:01:10 3.07 47.7 / 0.0 <-- mine CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 2-- - In Progress 2/25/10 03:10:45 3/1/10 03:10:45 0.00 0.0 / 0.0 CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 3-- - In Progress 2/25/10 03:09:24 3/1/10 03:09:24 0.00 0.0 / 0.0 CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 0-- 614 Error 2/23/10 05:25:23 2/25/10 03:05:13 7.15 93.8 / 0.0 CMD2_ 0350-MYH3.clustersOccur-1WAK_ A.clustersOccur_ 29_ 1-- 614 Error 2/23/10 05:18:41 2/23/10 22:11:06 6.81 106.0 / 0.0 <-- mine Prior to the first one I mentioned, I had never had an error. And the only errors I saw were where there was zero, or almost zero runtime from wingmen. But these all appear to have successfully run to completion. What am I doing wrong all of a sudden? For a minor cruncher like me, it's a lot of lost time and effort.... |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Commonality here batches 349/350-MYH3, but also a parent. Got a whole series of 349 children and some completing shortly, but since these errors had 349's+348's passing and validating fine. What I do know
----------------------------------------World Community Grid 6.14 Help Cure Muscular Dystrophy - Phase 2 CMD2_0349-MYH3.clustersOccur-1WUU_D.clustersOccur_216_593103_593711 03:44:41 (03:40:48) 24-02-2010 20:33 24-02-2010 20:33 Reported: Ok Anyway, making a dump of the current HCMD2 PV results to see if the observation holds... edit:... should more errors develop... then it could be something in the validation process. edit2: The near simultaneous extra 2 copies transmission indicates it occurring during validation.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Feb 25, 2010 9:11:07 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Seems to be some rogue batches
----------------------------------------Have had six error over night some without error codes also and some with; Result Log Result Name: CMD2_ 0350-MYH3.clustersOccur-3CWB_ P.clustersOccur_ 150_ 0-- <core_client_version>6.10.18</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. Finishing early because max runtime has been exceeded.21613.564148 called boinc_finish The above from a wu only lasting 6 hours. It appears somewhat random the first from23 feb 14.04 the last 24 feb 20.35. Also noticed,have received work units from this project and also from faah over the last couple of days with short 4 day return times for no reason.eg no error ,out of time inconclusive etc, just normal workunits. So maybe a validator problem . Chris. [Edit 1 times, last edit by Former Member at Feb 25, 2010 8:47:35 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
This is normal:
----------------------------------------Finishing early because max runtime has been exceeded.21613.564148 and is the standard 6 hour cut off line when 60% of the positions was not completed. 4 day return times are repair jobs [normal is 10], your computer deemed reliable. Could be repairs tasks for what it's already going now for half a day or so. My last 15, 3 in PV: CMD2_ 0349-MYH3.clustersOccur-2HJH_ A.clustersOccur_ 322_ 0-- 1112084 Valid CMD2_ 0349-MYH3.clustersOccur-1KLO_ A.clustersOccur_ 168_ 967971_ 969286_ 0-- 1112084 Pending Validation CMD2_ 0348-MYH3.clustersOccur-2O72_ A.clustersOccur_ 109_ 394893_ 395885_ 1-- 1112084 Pending Validation CMD2_ 0349-MYH3.clustersOccur-2QOV_ 3.clustersOccur_ 659_ 1-- 1112084 Valid CMD2_ 0349-MYH3.clustersOccur-3CMQ_ A.clustersOccur_ 38_ 106887_ 107575_ 0-- 1112084 Valid CMD2_ 0349-MYH3.clustersOccur-2EE6_ A.clustersOccur_ 68_ 558488_ 560348_ 0-- 1112084 Error CMD2_ 0348-MYH3.clustersOccur-1M6B_ A.clustersOccur_ 192_ 445174_ 445636_ 0-- 1112084 Valid CMD2_ 0349-MYH3.clustersOccur-2QOV_ K.clustersOccur_ 76_ 1-- 1112084 Pending Validation CMD2_ 0348-MYH3.clustersOccur-3D9T_ A.clustersOccur_ 17_ 134794_ 136732_ 0-- 1112084 Valid CMD2_ 0348-MYH3.clustersOccur-2JDQ_ A.clustersOccur_ 200_ 448095_ 448832_ 0-- 1112084 Valid CMD2_ 0349-MYH3.clustersOccur-1WUU_ D.clustersOccur_ 216_ 593103_ 593711_ 1-- 1112084 Error CMD2_ 0328-1433GA.clustersOccur-1BY1_ A.clustersOccur_ 5_ 119367_ 121961_ 120615_ 121961_ 0-- 1112084 Valid CMD2_ 0349-MYH3.clustersOccur-2DAT_ A.clustersOccur_ 94_ 0-- 1112084 Valid CMD2_ 0348-MYH3.clustersOccur-1M6B_ A.clustersOccur_ 327_ 756284_ 756818_ 0-- 1112084 Valid CMD2_ 0348-MYH3.clustersOccur-2QNK_ A.clustersOccur_ 70_ 0-- 1112084 Valid
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Feb 25, 2010 8:59:26 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Sekerob, not normal these were not repair jobs as stated in my post just normal wus.
Singles on faah and doubles here. Sent with 4 day deadlines .To my knowledge that is not the norm. Cheers Chris. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
That's what I said. Extra Make Up/Repair/No Reply copies are send out with 4 day deadline, ONLY to known reliable devices with known under 4 day return times.
----------------------------------------Make Ups are sometimes send out to push the completion of a batch or to test a new batch... well there you have a possible answer. Certainly for faah the techs do that every new batch, let's call them Reconnaissance tasks :D edit: those FAAH test tasks are really for the purpose to discover if they've been sized to run at fairly average run time, not so much to determine if there are errors, wrong parms e.g.
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 4 times, last edit by Sekerob at Feb 25, 2010 9:22:19 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for that.
But for the cruncher we have no way of knowing. Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit CMD2_ 0349-MYH3.clustersOccur-2QOV_ C.clustersOccur_ 182_ 504572_ 505628_ 504715_ 504943_ 504841_ 504943_ 1-- 614 Pending Validation 24/02/10 20:08:45 25/02/10 03:28:22 1.29 27.0 / 0.0 CMD2_ 0349-MYH3.clustersOccur-2QOV_ C.clustersOccur_ 182_ 504572_ 505628_ 504715_ 504943_ 504841_ 504943_ 0-- - In Progress 24/02/10 20:08:41 28/02/10 20:08:41 0.00 0.0 / 0.0 Never noticed this befor a nd have had a lot lately. Also all but 1 faah unit returned recently has been marked inconclusive. So still think the validator is suspect. My rac is getting screwed Chris |
||
|
|