Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 23
Posts: 23   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3048 times and has 22 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Application exited with RC = 0x100

Both bottom two of the distribution, so far, are 'error' results ending in 'rc - 0x100' and little hope there seems for the next two repairmen to complete differently. Dubious if they would.

restored to 26133.491513.
[05:22:26] Starting new Job
[05:22:26] Qink name = fldman
[05:22:38] Qink name = gesman
[05:22:40] Qink name = scfman
Application exited with RC = 0x100
[09:16:59] Finished Job #6
[09:16:59] Starting job 7,CPU time has been restored to 39822.467590.
[09:16:59] Skipping Job #7
09:17:06 (4698): called boinc_finish

E225650_ 557_ S.332.C44H33N5.BRTJOOAAAPRXBR-UHFFFAOYSA-N.15_ s1_ 14_ 3-- - In Progress 10/5/14 14:40:33 10/9/14 02:40:33 0.00 0.0 / 0.0
E225650_ 557_ S.332.C44H33N5.BRTJOOAAAPRXBR-UHFFFAOYSA-N.15_ s1_ 14_ 2-- - In Progress 10/5/14 14:40:31 10/9/14 02:40:31 0.00 0.0 / 0.0
E225650_ 557_ S.332.C44H33N5.BRTJOOAAAPRXBR-UHFFFAOYSA-N.15_ s1_ 14_ 1-- 700 Error 10/4/14 19:41:41 10/5/14 08:52:09 11.17 279.5 / 0.0
E225650_ 557_ S.332.C44H33N5.BRTJOOAAAPRXBR-UHFFFAOYSA-N.15_ s1_ 14_ 0-- 700 Error 10/4/14 19:39:16 10/5/14 14:39:02 11.04 288.7 / 0.0

According the xml export api the outcome was 3, error and a validation state of 2, means invalid. Which is it?

Outcome: Return results based on the outcome of their processing. 1 means success, 3 means error, 4 means no reply, 6 means validation error, 7 means abandoned./
ValidateState: Return results based on the validation status. 0 means pending validation, 1 means valid, 2 means invalid, 4 means pending verification, 5 means results failed to validate within given deadline.


The exit code is zero, not specified in api, but to the agent meaning there was no error recorded on the host, all normal, therefor more appropriately it really being an 'invalid'

This conundrum has been brought up before, and yes it's understood some tasks will not do what the program is supposed to do, but should this not be a wcg/cleanenergy problem, and not an issue of the volunteer who gets the 'error' slap in the face? 22 hours down the hole and more to go, on this one alone. The validator rules can surely be set to not spit out 'error' and let you internally park these results aside, yes? On reading in past 'rc = nnn' was treated as 'not the volunteers problem', so why now?

Let's monitor this one and see if an earlier take out is justified when rc =100 occurs, i will.

On the search history front, found threads going back to 2010, that include replies like 'we'll give credit', which is the least of my interest. The issue is, they record as node error, therefor the host is forced out of reliability and wastes the next 20 computing with a wingman, when it could genuinely do it alone. Compute those wasted hours per day. FYI, this was a technicians reply in 2013 http://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=431871 saying it's a validator margin issue to be corrected.

Please fix the public faced treatment. Disconcerting, disturbing, volunteer away-driving.
[Oct 5, 2014 4:42:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

Need an expansion pack on the brain-pan as when going through the valids on the result status page, seeing this

E225627_ 28_ S.304.C34H14N10O2.MQSULQBIBCOMGA-UHFFFAOYSA-N.11_ s1_ 14_ 1-- 700 Valid 10/3/14 14:08:42 10/4/14 19:55:53 18.00 482.8 / 436.3
E225627_ 28_ S.304.C34H14N10O2.MQSULQBIBCOMGA-UHFFFAOYSA-N.11_ s1_ 14_ 0-- 700 Valid 10/3/14 14:04:44 10/4/14 11:19:38 14.99 389.8 / 436.3

The top one was killed due max time exceeding, not making it to job #6, mine went on and got, drums rolling rc =0x100, yet declared valid. The only thing i can think of, is that in this instance the 2 tasks were compared only on the jobs _1 managed to complete through #5, and the over and above was taken for granted. Which one was declared canonical and is being send to harvard is the 65,001 usd question.

Result _1 end piece of log.

19:58:23] Finished Job #5
[19:58:23] Starting job 6,CPU time has been restored to 46882.071564.
[19:58:23] Starting new Job
[19:58:23] Qink name = fldman
[19:58:32] Qink name = gesman
[19:58:35] Qink name = scfman
Killing job because cpu time limit has been exceeded. 46882.071564||17917.965591||0.000000
[02:07:16] Finished Job #6
02:07:22 (6298): called boinc_finish

</stderr_txt>
]]>

Result _0 end piece of log.

[08:27:34] Finished Job #5
[08:27:34] Starting job 6,CPU time has been restored to 36752.327027.
[08:27:34] Starting new Job
[08:27:34] Qink name = fldman
[08:27:43] Qink name = gesman
[08:27:45] Qink name = scfman
Application exited with RC = 0x100
[13:14:40] Finished Job #6
[13:14:40] Starting job 7,CPU time has been restored to 53626.065955.
[13:14:40] Skipping Job #7
13:14:45 (1374): called boinc_finish

</stderr_txt>
]]>

Summary, if the wingman hits the 18 hour boundary before being allowed to finish #6 with a rc = 0x100 and the other does, things are fine. Feels like rigged dice at ocean 16
[Oct 5, 2014 5:36:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
WinterGuard1944
Cruncher
Czech Republic
Joined: Apr 23, 2013
Post Count: 5
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

Hello,

I noticed this funny thing:

E225634_ 883_ S.322.C42H28N6.SLLJKJQOKISNAC-UHFFFAOYSA-N.13_ s1_14_1-- 700 Valid 3.10.14 23:48:39 5.10.14 11:22:07 18.00 313.8 / 177.3
E225634_ 883_ S.322.C42H28N6.SLLJKJQOKISNAC-UHFFFAOYSA-N.13_ s1_ 14_0--700 Valid 3.10.14 23:45:36 4.10.14 06:25:39 5.90 285.4 / 1,240.8

My workunit is that lucky. My computer finished all jobs and and ended with RC=0x100 in job #6, while that other computer exceeded maximum time while still doing job #0. Maybe it is useful for you to know what happened in this case.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by WinterGuard1944 at Oct 5, 2014 7:32:55 PM]
[Oct 5, 2014 7:26:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

How the 177.3 was arrived at, not going to try, but 1 job got 177.3 so the other that did 7 gets 1240.80, but, this thread is -not- about the borked points methodology, there's many active threads touching on the cep2 credits, this is about the -validation- itself. Your case is in that the same as in my previous post. Just because one of two in a quorum does not manage to get to the fail-over point in job #6, the result suddenly is rated valid. Baffling.
[Oct 5, 2014 9:44:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

As I posted in another thread somewhere, this is the reason I don't run this project anymore. Inconsistency in the errors and a total lack of concern about it. As I stated earlier, no explanation, no crunching.
[Oct 6, 2014 12:06:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Eric_Kaiser
Veteran Cruncher
Germany (Hessen)
Joined: May 7, 2013
Post Count: 1047
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

Started again 2 weeks ago with cep2 and didn't have any errors - until today. Today two wu errored out with RC=0x100.
My amd crunching box got many wu with quorum=1 and replication=1.
I think I will switch to mcm1 until someone of the techs is looking into this issue...
----------------------------------------

[Oct 6, 2014 10:28:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

Hi,

Thanks for bringing this up. I will make sure it is passed to our friends at IBM. I am sorry I cannot be of too much assistance here, since I am not an expert in distributed computing, or how the grid is put together internally.

Your Harvard CEP Team
[Oct 6, 2014 5:19:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Eric_Kaiser
Veteran Cruncher
Germany (Hessen)
Joined: May 7, 2013
Post Count: 1047
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

As long as someone is taking care on this issue I'm fine.
----------------------------------------

[Oct 6, 2014 5:40:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

Hi.

I've got this error as well, lost just over 12hrs.

[16:12:16] Starting job 6,CPU time has been restored to 25428.768000.
[16:12:16] Starting new Job
[16:12:17] Qink name = fldman
[16:12:28] Qink name = gesman
[16:12:30] Qink name = scfman
Application exited with RC = 0x100
[21:15:02] Finished Job #6
[21:15:02] Starting job 7,CPU time has been restored to 43008.136000.
[21:15:02] Skipping Job #7
21:15:07 (6922): called boinc_finish

Not happy. crying
[Nov 7, 2014 8:34:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Application exited with RC = 0x100

I just started my weekend crunch of CEP2 WUs... I hope I will avoid this issue, since I can only crunch this project on weekends. Luck to all.

CJSL

Crunching for a better world...
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


[Nov 7, 2014 9:05:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 23   Pages: 3   [ 1 2 3 | Next Page ]
[ Jump to Last Post ]
Post new Thread