| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Locked Total posts in this thread: 12
|
|
| Author |
|
|
GenCom.org
Cruncher Joined: Nov 27, 2006 Post Count: 17 Status: Offline |
Hi,
----------------------------------------Since this morning, I've a lot of Human Preoteome Folding 2 workunits with status "Error" and in boinc agent log all of these finished correctly. here is some examples : la652_ 00084-- Error 03/20/2007 07:43:54 03/22/2007 07:29:04 11.80 76.1 / 0.0 la646_ 00052-- Error 03/19/2007 21:17:04 03/22/2007 07:20:12 3.82 75.4 / 0.0 la663_ 00025-- Error 03/20/2007 22:36:12 03/22/2007 06:33:01 6.76 64.0 / 0.0 la649_ 00020-- Error 03/20/2007 00:45:21 03/22/2007 06:15:29 3.37 70.7 / 0.0 la645_ 00030-- Error 03/19/2007 19:59:52 03/22/2007 05:45:27 3.62 71.5 / 0.0 la644_ 00007-- Error 03/19/2007 19:05:53 03/22/2007 05:40:16 4.06 80.0 / 0.0 la636_ 00011-- Error 03/18/2007 19:31:03 03/22/2007 05:39:25 15.07 98.5 / 0.0 la653_ 00088-- Error 03/20/2007 09:31:02 03/22/2007 04:23:45 8.81 93.0 / 0.0 la647_ 00018-- Error 03/19/2007 21:07:29 03/22/2007 02:55:29 3.23 67.9 / 0.0 la647_ 00020-- Error 03/19/2007 21:07:29 03/22/2007 02:50:18 3.22 67.6 / 0.0 and Boinc Log : 22/03/2007 08:10:18|World Community Grid|Computation for task la646_00052_17 finished 22/03/2007 07:09:47|World Community Grid|Computation for task la649_00020_12 finished 22/03/2007 05:05:42|World Community Grid|Computation for task la647_00006_9 finished Any idea ? Regards ![]() [Edit 4 times, last edit by GenCom.org at Mar 22, 2007 3:32:58 PM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Is this 1 machine (quad e.g) or more (dual core / single core)?
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
GenCom.org
Cruncher Joined: Nov 27, 2006 Post Count: 17 Status: Offline |
it is on several computers (QX6700, E6300, E4300, Xeon 3.0, P-IV 3.2 , Sempron 2800+, ...), it seems that theses errors started at this unit :
----------------------------------------la652_ 00052-- Erreur 20/03/2007 06:34:34 21/03/2007 11:36:01 6,17 70,2 / 0,0 ![]() [Edit 1 times, last edit by GenCom.org at Mar 22, 2007 9:39:08 AM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
All same antivirus program and a definition update that happened this morning?
----------------------------------------The spread of dates received and the time block of completion/return since errors started suggest a local problem. No one else has reported this so far.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
GenCom.org
Cruncher Joined: Nov 27, 2006 Post Count: 17 Status: Offline |
not the same for all, some computers use Avast, some Etrust, and some run without any antivirus
---------------------------------------- have a look to la658_ 00071 or la658_ 00025 or la657_ 00032 or la652_ 00052 detail, you will see that everyone have an error status ![]() [Edit 2 times, last edit by GenCom.org at Mar 22, 2007 10:03:48 AM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
not the same for all, some computers use Avast, some Etrust, and some run without any antivirus have a look to la652_ 00052 detail, you will see that everyone have an error status Now that last bit is very useful info..... a basic verification to see if local or spread, by checking the result status detail. That's why I've compiled the 'Issue?' Q&A under the link in my signature. Will alert staff!, meantime if u can post a sample of the quorum detail as i cannot see it... only the techs who are still on one ear.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
GenCom.org
Cruncher Joined: Nov 27, 2006 Post Count: 17 Status: Offline |
some units are now with status too Late !
----------------------------------------la648_ 00045 Too Late 03/19/2007 22:35:15 03/22/2007 10:03:38 3.63 71.7 / 0.0 Here is the detail for this one (https://secure.worldcommunitygrid.org/ms/devi...tus.do?workunitId=4555271) la648_ 00045-- Erreur 21/03/2007 01:43:02 21/03/2007 12:07:57 6,95 87,2 / 0,0 la648_ 00045-- Erreur 20/03/2007 12:52:10 21/03/2007 07:48:23 3,57 50,5 / 0,0 la648_ 00045-- Erreur 20/03/2007 03:30:17 20/03/2007 13:53:19 5,41 64,5 / 0,0 la648_ 00045-- Erreur 20/03/2007 00:52:24 21/03/2007 01:38:30 0,00 0,0 / 0,0 la648_ 00045-- Erreur 19/03/2007 23:30:15 21/03/2007 00:20:42 8,54 66,5 / 0,0 la648_ 00045-- Erreur 19/03/2007 23:06:48 20/03/2007 03:29:08 2,68 17,2 / 0,0 la648_ 00045-- En cours 19/03/2007 23:02:34 28/03/2007 23:02:34 0,00 0,0 / 0,0 la648_ 00045-- Erreur 19/03/2007 23:02:21 20/03/2007 13:40:36 7,65 71,5 / 0,0 la648_ 00045-- En cours 19/03/2007 22:57:27 28/03/2007 22:57:27 0,00 0,0 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:56:46 20/03/2007 20:49:59 11,95 84,1 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:49:27 21/03/2007 18:07:25 4,79 53,9 / 0,0 la648_ 00045-- Trop tard 19/03/2007 22:35:15 22/03/2007 10:03:38 3,63 71,7 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:34:02 20/03/2007 08:15:57 8,19 76,4 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:29:26 22/03/2007 01:14:57 11,34 71,4 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:27:56 20/03/2007 17:48:12 6,74 50,0 / 0,0 la648_ 00045-- En cours 19/03/2007 22:27:44 28/03/2007 22:27:44 0,00 0,0 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:26:47 20/03/2007 22:55:51 11,53 59,4 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:26:37 21/03/2007 05:35:47 14,02 67,6 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:25:36 20/03/2007 12:47:55 8,33 15,0 / 0,0 la648_ 00045-- Erreur 19/03/2007 22:25:14 20/03/2007 04:07:30 3,63 64,6 / 0,0 la648_ 00045-- En cours 19/03/2007 22:18:26 28/03/2007 22:18:26 0,00 0,0 / 0,0 ![]() |
||
|
|
olympic
Senior Cruncher Joined: Jun 12, 2005 Post Count: 156 Status: Offline |
Same here, I'm seeing approximately a 50% error rate with HPF2. The WU takes the full amount of CPU time and finishes normally with no error messages in the BOINC log. I also have one listed as "too late" even though it was returned within 3 days.
----------------------------------------I had turned off HPF2 a while back due to occasional errors, boy did I pick a bad time to turn it back on! Aborting the rest in queue ASAP.![]() |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
It's very possibly just a server side validation process error as the 'too late' simply should not be. Suggest to Suspend, rather than Abort for now.
----------------------------------------U may not have seen it yet, but WU's can now remotely be aborted with version 5.8 and up (but only if the client initiated the contact with the servers!!!!) See here for how it works/worked for Genome Comparison: http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=12459#90438 The remote abort routine was introduced on accelerated development push by WCG. Client side the function existed, but server side not..... we're on BOINC server v 5.09 now according the message logs.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 2 times, last edit by Sekerob at Mar 22, 2007 2:11:47 PM] |
||
|
|
GenCom.org
Cruncher Joined: Nov 27, 2006 Post Count: 17 Status: Offline |
seems that every is working again, great job
----------------------------------------![]() |
||
|
|
|