Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 8
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1130 times and has 7 replies Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

So went for a late walk about and left 2 clients with a cache greater than 2 days in anticipation after a comment on size testing. On return, find 20 Beta queued with run times of 5.5 to 7 hours, the first one nearly finished at 1:25. All with a deadline of 2 days.

Thank you.

PS, on the side got 3 CEP jobs too though with rush deadlines of 3 days hypnotized

Edit: Oops, 27 as 7 were already finished shhh
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at May 21, 2009 9:43:30 PM]
[May 21, 2009 6:29:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

We better get used to a new, yet not alarming message in the Result Log:

<core_client_version>6.6.28</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Finishing early because max runtime has been exceeded.0
called boinc_finish

</stderr_txt>
]]>

and

<core_client_version>6.6.24</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Finishing early because max runtime has been exceeded.1601933466
called boinc_finish

</stderr_txt>
]]>

it's a guess why one has 0 seconds (?) left on termination.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at May 21, 2009 9:43:56 PM]
[May 21, 2009 7:42:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

I will be very curious as how the validation is going to work if one computer in a quorum manages to run the job to the end and the other is cut short exceeding the maximum run time. The below sample is where the top one was cut short and the other 2 have a normal log without 'exceed' message, seemingly passing the > 60% completion test rule [read somewhere].

BETA_ CMD2_ 0001-GPDAA.clustersOccur-KIF3AA.clustersOccur_ 46_ 2-- 612 Pending Validation 21-5-09 17:38:13 21-5-09 21:13:29 1.00 6.8 / 0.0
BETA_ CMD2_ 0001-GPDAA.clustersOccur-KIF3AA.clustersOccur_ 46_ 1-- 612 Pending Validation 21-5-09 17:37:53 21-5-09 21:30:16 1.50 21.8 / 0.0
BETA_ CMD2_ 0001-GPDAA.clustersOccur-KIF3AA.clustersOccur_ 46_ 0-- 612 Pending Validation 21-5-09 17:37:44 21-5-09 21:08:18 1.37 23.4 / 0.0
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at May 21, 2009 9:46:55 PM]
[May 21, 2009 9:36:06 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

Registro de Resultados

<core_client_version>6.2.28</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
No heartbeat from core client for 30 sec - exiting
Finishing early because max runtime has been exceeded.1284009679
called boinc_finish

</stderr_txt>
]]>

But the result is valid.
[May 22, 2009 6:03:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

I came home from work yesterday and found all four quads packed with betas. I had a cache of five days set and had CEP set so I could finish off my year I usually do for each project.

Little did I know that CEP is sending very few units out, and the ones they did send were not running as the betas were taking higher priority. Polished off 5 days of runtime in beta units just last night. Each quad computer has 40-50 betas in them running 1.5 hours approx each. The dual has about 6 betas in the queue at any one time.

There are alot of beta units floating around today.

I have not had any of them error out or become inconclusive.

A couple have project aborted, I assume they were no longer needed by the server.

k.t.
----------------------------------------
[Edit 1 times, last edit by Former Member at May 22, 2009 1:47:49 PM]
[May 22, 2009 1:42:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

I will be very curious as how the validation is going to work if one computer in a quorum manages to run the job to the end and the other is cut short exceeding the maximum run time.

This is exactly the question I also had in mind. We will always have different speeds in a quorum. In the easiest case this will only result in different runtimes til the end of the WU. But it gets difficult when one runs to the end and another one gets cut off. And even when all (in production = both?) get cut off, it will for sure be at different numbers of positions calculated. Will all the slower machines cause chopping of the WUs which faster machines could do alone? My slowest machine needs 3x more time than my fastest. And even my fastest is slow compared to what others have here.
One way to get rid of the validation problem would be to run with zero redundancy, but this does not necessarily prevent the chopping caused by very slow machines.
It would be very interesting how this is solved. Perhaps knreed could provide more insight.
Greetings
Thorsten
[May 22, 2009 3:23:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
nasher
Veteran Cruncher
USA
Joined: Dec 2, 2005
Post Count: 1422
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

yes this will be interesting to see how these play out..

but me personally im just happy to get beta work... got a total of 5 days built up and i want at least my bronze badge... sometime.. please....
----------------------------------------

[May 22, 2009 5:13:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Beta Testing - HCMD Phase 2, v 6.12 May 21, 2009

We are simply validating the number of positions that are computed in common. The validated positions are then archived and child workunits are created in order to complete the workunit.

We are awarding credit based on the total number of positions computed. Thus two results in the same quorum could wind up with different credit awarded.
[May 22, 2009 6:48:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread