Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 28
Posts: 28   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6332 times and has 27 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

It's good news that the researchers will at least end up with valid results from those particular workunits. Thanks for the update.
[Sep 11, 2013 8:07:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
littlepeaks
Veteran Cruncher
USA
Joined: Apr 28, 2007
Post Count: 748
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

OK, I got a similar problem last night. I errored out on the first task (job #0) with a 0x1, everyone else errored out with a 0x1 or 0x100 on the first task.

E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 7-- 640 Pending Validation 9/26/13 17:19:20 9/26/13 17:30:43 0.15 3.6 / 0.0
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 6-- - In Progress 9/26/13 17:16:40 9/29/13 17:16:40 0.00 0.0 / 0.0
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 4-- 640 Error 9/26/13 15:50:59 9/26/13 17:03:14 0.14 5.6 / 0.0
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 5-- 640 Error 9/26/13 15:50:42 9/26/13 16:50:27 0.23 4.2 / 0.0
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 3-- 640 Error 9/26/13 14:04:32 9/26/13 14:23:34 0.24 7.2 / 0.0
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 2-- 640 Error 9/26/13 13:57:03 9/26/13 15:43:54 0.18 4.2 / 0.0
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 1-- 640 Error 9/26/13 08:19:12 9/26/13 12:26:47 0.18 4.3 / 0.0
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 0-- 640 Error 9/26/13 08:14:01 9/26/13 13:42:18 0.17 4.7 / 0.0

I am copy number 0. The PV (Copy # _7) got a 0x100 on job 0.
[Sep 26, 2013 7:14:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

Yep, getting the same here. Looks like a bad batch.


Result Log

Result Name: E215751_ 782_ I.23.C15F3H6N3O2.00214865.3.set1d06_ 3--



<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[01:39:50] Number of jobs = 16
[01:39:50] Starting job 0,CPU time has been restored to 0.000000.
Application exited with RC = 0x1
[01:54:02] Finished Job #0
[01:54:02] Starting job 1,CPU time has been restored to 841.718996.
[01:54:02] Skipping Job #1
[01:54:02] Starting job 2,CPU time has been restored to 841.718996.
[01:54:02] Skipping Job #2
[01:54:02] Starting job 3,CPU time has been restored to 841.718996.

Snip...................................

[01:54:02] Skipping Job #13
[01:54:02] Starting job 14,CPU time has been restored to 841.718996.
[01:54:02] Skipping Job #14
[01:54:02] Starting job 15,CPU time has been restored to 841.718996.
[01:54:02] Skipping Job #15
01:54:02 (4528): called boinc_finish

</stderr_txt>
]]>


Return to Top
[Sep 26, 2013 8:06:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

I see a lot of WUs failing recently. Many of them have multiple resends and most are failing.

E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 8-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 7-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 5-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 6-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 3-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 4-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 2-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 1-- 640 Error
E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 0-- 640 Error
----------------------------------------

[Sep 27, 2013 6:40:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

Maybe the tasks are ok, but very short and system consider them as error. I got E215802_952_I.27.C17F3H6N5O2.00202347.0.set1d06 with time 0.09hr, exited RC = 0x1 and other 15 jobs were skipped.
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 27, 2013 12:33:13 PM]
[Sep 27, 2013 12:32:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
kateiacy
Veteran Cruncher
USA
Joined: Jan 23, 2010
Post Count: 1027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

I've had a half dozen in the last couple of days in which all wingmen had errors, for example:

Workunit Status

Project Name: The Clean Energy Project - Phase 2
Created: 09/24/2013 12:45:08
Name: E215774_571_I.23.C17H11N5S.00224467.2.set1d06
Minimum Quorum: 2
Replication: 2


Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 7-- 640 Error 9/26/13 21:41:38 9/26/13 22:39:19 0.18 3.7 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 8-- 640 Error 9/26/13 21:39:16 9/27/13 13:50:54 0.26 3.6 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 5-- 640 Error 9/26/13 14:53:15 9/26/13 21:27:53 0.22 4.8 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 6-- 640 Error 9/26/13 14:47:34 9/26/13 15:05:51 0.18 2.7 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 4-- 640 Error 9/25/13 17:55:55 9/25/13 18:32:50 0.19 3.0 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 3-- 640 Error 9/25/13 17:55:47 9/26/13 14:41:07 0.26 4.4 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 2-- 640 Error 9/25/13 13:26:58 9/25/13 17:51:58 0.24 3.8 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 1-- 640 Error 9/25/13 13:14:26 9/25/13 15:17:28 0.28 4.9 / 0.0
E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 0-- 640 Error 9/25/13 12:48:31 9/25/13 13:06:18 0.25 3.9 / 0.0

As others have seen, some of the errors are RC = 0x1 and others are RC = 0x100.
----------------------------------------

[Sep 27, 2013 11:51:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18667
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

I had a couple of WUs error out today. Both had the Application exited with RC = 0x1 about 20-30 minutes into job #0. I have to think this is a different problem from that described in this thread initially. Those WUs were running more normal times and the problem was in validation, not in the crunching.
----------------------------------------
Join/Website/IMODB



[Sep 30, 2013 4:34:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

I have tons of such WUs, but nobody seems to care about that...
----------------------------------------

[Sep 30, 2013 6:22:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
seippel
Former World Community Grid Tech
Joined: Apr 16, 2009
Post Count: 392
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

We are aware of the increase in work units failing for CEP2 and are working with the Harvard team to resolve the issue. The problem is that some work units cause a fatal error in the Q-Chem code. Ideally these work units work units would be identified ahead of time, but if that proves impossible we will make sure this is handled on the validation side. Until a more permanent solution can be found, work units that experience this problem are manually being given credit.

Seippel
[Sep 30, 2013 6:57:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Wingmen error with RC = 0x100, mine errors with normally-valid RC = 0x1

We are aware of the increase in work units failing for CEP2 and are working with the Harvard team to resolve the issue. The problem is that some work units cause a fatal error in the Q-Chem code. Ideally these work units work units would be identified ahead of time, but if that proves impossible we will make sure this is handled on the validation side. Until a more permanent solution can be found, work units that experience this problem are manually being given credit.

Seippel


Thanks for the ACK.
----------------------------------------

[Sep 30, 2013 7:30:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 28   Pages: 3   [ Previous Page | 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread