Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 20
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8182 times and has 19 replies Next Thread
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Ok problem found...all of the se work units that restart will have this issue. Basically, in the DYNAMICS loop it says @Nstep and should be, @restartnstep. As long as the work unit isn't restarted it'll finish to completion. I will be discussing with the researchers tomorrow morning what they wish to do with these. The pe work units appear to be fine it is only the B.se work units.

-Uplinger
[Mar 11, 2010 4:03:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Basically if you can let the work units run to completion without stopping or suspending them they will complete. Basically at this time, checkpointing is not working on these work units. Having the wrong variable makes the work unit start from scratch on the dynamics loop. But the counter for times through the loop keep going. I will see what the researchers want. But if you can just let them run without stopping them.

-Uplinger
[Mar 11, 2010 4:07:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
HutchNYC
Advanced Cruncher
United States
Joined: Nov 27, 2005
Post Count: 97
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Ok. The few se units that bonic switched from mid-completion I have now manually suspended until you guys figure out what you want to do with them.

Thanks for looking into this Uplinger.

Hutch
----------------------------------------
[Mar 11, 2010 4:14:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Hutch, Can you edit the subject in the first post to say type-b.se. I have confirmed with my own test that it does not affect B.pe work units.

thanks,
-Uplinger
[Mar 11, 2010 4:37:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
HutchNYC
Advanced Cruncher
United States
Joined: Nov 27, 2005
Post Count: 97
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Certainly.

Hutch
----------------------------------------
[Mar 11, 2010 4:45:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Whilst eyes shut, 2 pe + 2 se came in though the device had 4 HPF2 crash-outs on the 9th... reliability must have come back up quite quickly :biggrin:. Manually pushed them to front just now (the suspended HCC tasks had no wingman waiting ;>)

This fetch happened when BOINCTasks showed a total of about 1.6 days per core CPU time in the cache. (at time of going into dormant state :o)

PS. v.v. the se stop/start issue of this set, will make sure they run in one stretch. Backup project suspended. cool

First checkpoint log entries for result names:

1892 World Community Grid 11-03-2010 07:29:04 [checkpoint_debug] result erlc_d098_se0000_1 checkpointed
1893 World Community Grid 11-03-2010 07:29:12 [checkpoint_debug] result erlc_d099_se0000_1 checkpointed
1894 World Community Grid 11-03-2010 07:29:35 [checkpoint_debug] result erlc_d170_pe0000_1 checkpointed
1895 World Community Grid 11-03-2010 07:30:02 [checkpoint_debug] result erlc_d105_pe0000_1 checkpointed
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 11, 2010 7:30:17 AM]
[Mar 11, 2010 6:32:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
X-Files 27
Senior Cruncher
Canada
Joined: May 21, 2007
Post Count: 391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Exit code 29 on type-b.se

Result Name: erlc_ c123_ se0000_ 1--

<core_client_version>6.10.36</core_client_version>
<![CDATA[
<message>
The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
forrtl: severe (29): file not found, unit 1, file D:\BOINC Data\slots\3\fort.1
Image PC Routine Line Source
wcg_dddt2_charmm_ 00B1638E Unknown Unknown Unknown

Stack trace terminated abnormally.

</stderr_txt>
]]>

----------------------------------------

[Mar 11, 2010 7:25:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 3010
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Whilst eyes shut, 2 pe + 2 se came in though the device had 4 HPF2 crash-outs on the 9th... reliability must have come back up quite quickly


Sekerob, although I could well be mistaken, I don't think having a reliable computer is a requirement for receiving B & C type units - just the A's. Thus, those 4 WU's you've got, could well be helping you get back up to that lofty status.
----------------------------------------

[Mar 11, 2010 10:16:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

Whatever the priority they're circulated with... suites me, think the known return time for a device has to be < 2 days.

Meantime the 2 se completed in 1 run, 3:44 and 3:50 hours respectively, both PV of which 1 is quorum complete.

edit: 1 valid

erlc_ d099_ se0000_ 1-- 1112084 Valid 10-3-10 23:11:11 11-3-10 10:26:44 3.66 67.5 / 69.9

The pe tasks a solidly on track to do full 10 hours. RAM peak at 46% showing 24Mb and VM stuck on max of 540Mb. Kernel time 15 seconds, PF Soft: 31k, Delta zero. Very nice.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 11, 2010 11:00:55 AM]
[Mar 11, 2010 10:30:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Problems with type-b batch

You will start seeing some new work units erls_*se0000, these are to make sure that the results from the work units above are proper. future erlc*se0000 work units will have the checkpoint fix in them and won't require being sent as erls*se0000.

If you can let the current se0000 work units run from start to finish without a stopping then let the work units run. If you are not able to do this on your machine, then please abort the work unit to let another member complete them.

-Uplinger
[Mar 11, 2010 9:42:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread