Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8156 times and has 6 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Server Aborted

Getting lots of WUs sent to me, all machines, do not start but get Server Aborted a couple of hours later?.......
[Sep 5, 2011 5:05:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Aborted

.....and now no new tasks available?????
[Sep 5, 2011 5:07:29 AM]   Link   Report threatening or abusive post: please login first  Go to top 
LCB001
Advanced Cruncher
CANADA
Joined: Oct 14, 2009
Post Count: 69
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Aborted

----------------------------------------

[Sep 5, 2011 5:09:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server Aborted

.....thanks biggrin
[Sep 5, 2011 5:12:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
yoro42
Ace Cruncher
United States
Joined: Feb 19, 2011
Post Count: 8979
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Aborted

The same problem. Thanks for the info. Info follows in case it helps.

Project Name: Drug Search for Leishmaniasis
Created: 09/04/2011 13:06:34
Name: DSFL_00000005_0000013_0943
Minimum Quorum: 2
Replication: 2

DSFL_ 00000005_ 0000013_ 0943_ 0-- 619 Server Aborted 9/4/11 21:48:11 9/5/11 04:29:17 0.00 0.0 / 0.0
DSFL_ 00000005_ 0000013_ 0943_ 1-- 619 Server Aborted 9/4/11 21:47:09 9/5/11 05:48:29 0.00 0.0 / 0.0

Result Log
Result Name: DSFL_ 00000005_ 0000013_ 0943_ 0--
<core_client_version>6.12.33</core_client_version>
close

AND

Project Name: Drug Search for Leishmaniasis
Created: 09/04/2011 13:06:40
Name: DSFL_00000005_0000013_0326
Minimum Quorum: 2
Replication: 2

DSFL_ 00000005_ 0000013_ 0326_ 2-- - In Progress 9/4/11 21:50:13 9/8/11 21:50:13 0.00 0.0 / 0.0
DSFL_ 00000005_ 0000013_ 0326_ 0-- 619 Error 9/4/11 21:47:54 9/4/11 21:49:09 0.00 0.0 / 0.0

Result Log
Result Name: DSFL_ 00000005_ 0000013_ 0326_ 0--
<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
- exit code 195 (0xc3)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[23:48:16] Number of tasks = 40
[23:48:16] Starting job 0,CPU time is 0.000000.
[23:48:16] ZINC08593712.pdbqt size = 32 4 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000005.pdbqt size = 9257 0
Application exited with RC = 0x1
23:48:17 (2964): called boinc_finish

</stderr_txt>
]]>

DSFL_ 00000005_ 0000013_ 0326_ 1-- 619 Server Aborted 9/4/11 21:47:53 9/5/11 04:29:17 0.00 0.0 / 0.0

Result Log
Result Name: DSFL_ 00000005_ 0000013_ 0326_ 1--
<core_client_version>6.12.33</core_client_version>
----------------------------------------

[Sep 5, 2011 5:33:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JSYKES
Senior Cruncher
Joined: Apr 28, 2007
Post Count: 206
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Aborted

There is potentially another problem, possibly related to frequency of check points as 5 of my machines were hit by a power cut on Saturday and not restarted until this morning (Monday) and all the DSFL WU's that were underway continued to conclusion and returned results (4hrs+ to 6hrs+ of processing time each) but they have all been 'errored' - which makes me ask whether there was an inherent flaw induced by the power cut that didn't result in a WU restarting from the last good check point but from a random point that corrupted the return data....
In addition, I have also seen, as have others, more error prone units and server aborts as a result....I hope this is not another repeat of the previous problems that we encountered with 'buggy' WU's....
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by JSYKES at Sep 5, 2011 8:21:45 PM]
[Sep 5, 2011 8:20:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JEklund2
Advanced Cruncher
Finland
Joined: Aug 10, 2006
Post Count: 119
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server Aborted

Howdy,
from what batch these WU's were ?

Based on the following info from knreed the root cause might possibly be the errors in batch, not the power problem?

>Target 7 and 8 look like they are running correctly. Target 6 does >have errors so we have cancelled those workunits.
>
>We are starting a 100 workunit test with targets 9-15. Those have >been sent out.
>
>We have now resumed distributing work for this project for targets >7 and 8. We are will run the project at a reduced paced until we >have a some process improvements in place to avoid this type of >issue in the future.

my 0.02 euros :-)
----------------------------------------

[Sep 5, 2011 8:30:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread