Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 118
Posts: 118   Pages: 12   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 25932 times and has 117 replies Next Thread
vepaul
Senior Cruncher
Belgium
Joined: Nov 17, 2004
Post Count: 261
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

No, it was on the same machine, as far as I can see.
[Aug 17, 2014 2:24:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

That would be an absolute very first to happen in a quorum 2 or greater distribution, that's then after over 2 billion results at wcg. The result status page, without drilling into detail distribution shows the different device names as you had both _0 and _1 copies. Since you can have multiple devices with the same name, wcg does not care, you'd then have to mouse hover over the device names to see the actual unique device id.
[Aug 17, 2014 2:36:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

Got 2 on my i5-2500S Mac OS X 10.9.4 7.0.65, one already errored out:

Result Name: BETA_ E225106_ 60_ S.324.C38H26N10.YFHQHCVYIOUKJP-UHFFFAOYSA-N.14_ s1_ 14_ 3-- 


<core_client_version>7.0.65</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[17:04:04] Number of jobs = 8
[17:04:04] Starting job 0,CPU time has been restored to 0.000000.
[17:04:05] Starting new Job
[17:04:05] Qink name = fldman
[17:04:07] Qink name = gesman
[17:04:08] Qink name = scfman
[18:11:21] Qink name = anlman
[18:11:21] Qink name = drvman
Application exited with RC = 0x100
[18:12:38] Finished Job #0
called boinc_finish

</stderr_txt>
]]>


ETA1: also the second one:

Result Name: BETA_ E225106_ 259_ S.326.C41H29N5S1.TZUBUZMEZPCNHP-UHFFFAOYSA-N.1_ s1_ 14_ 9-- 


<core_client_version>7.0.65</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[18:10:24] Number of jobs = 8
[18:10:24] Starting job 0,CPU time has been restored to 0.000000.
[18:10:24] Starting new Job
[18:10:24] Qink name = fldman
[18:10:25] Qink name = gesman
[18:10:26] Qink name = scfman
[19:04:11] Qink name = anlman
[19:04:11] Qink name = drvman
Application exited with RC = 0x100
[19:05:13] Finished Job #0
called boinc_finish

</stderr_txt>
]]>


3rd one is running, but I am not holding my breath.
----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

----------------------------------------
[Edit 1 times, last edit by branjo at Aug 17, 2014 6:38:05 PM]
[Aug 17, 2014 4:14:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
vepaul
Senior Cruncher
Belgium
Joined: Nov 17, 2004
Post Count: 261
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

That would be an absolute very first to happen in a quorum 2 or greater distribution, that's then after over 2 billion results at wcg. The result status page, without drilling into detail distribution shows the different device names as you had both _0 and _1 copies. Since you can have multiple devices with the same name, wcg does not care, you'd then have to mouse hover over the device names to see the actual unique device id.


I have only 2 devices working, and only one uses GPU ...
Why the other one does not run CEP2 is a mystery: both run on Windows7.
[Aug 18, 2014 12:03:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

We have added another 1000 workunits to the mix. These should fix the errors due to max memory being used.

Thanks,
-Uplinger
[Aug 18, 2014 5:44:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

Got 16 of them on my Mac only - Win rig and Linux clouds went dry.
----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Aug 18, 2014 6:49:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

Three have just exited with lines like this in the Event Log:
BETA_E225108_59_S.328.C41H25N7O1.JFONBYKDRGSVKP-UHFFFAOYSA-N.1_s1_14_0 exited with zero status but no 'finished' file.
[Aug 18, 2014 7:22:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

Tony, what you are seeing should be results that are exiting due to not converging. These should be marked as valid going forward as the data is still useful for the researchers. You should see them go into pending validation state.

Thanks,
-Uplinger
[Aug 18, 2014 8:01:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

Thanks, Keith. Things are stranger than I first thought. This is on an i5-750, 4 cores, Win7, 16GB DDR3. All 4 cores commenced Beta CEP2 units within seconds of each other (risky, I know, and possibly the cause of some of these outcomes):

18/08/2014 18:45:35 | World Community Grid | Starting task BETA_E225108_530_S.328.C44H29N5.ONXAAYOVMMZRSK-UHFFFAOYSA-N.14_s1_14_1
18/08/2014 18:45:36 | World Community Grid | Starting task BETA_E225108_588_S.328.C42H26N6O1.JXTUBXMYSMVBOD-UHFFFAOYSA-N.15_s1_14_0
18/08/2014 18:45:36 | World Community Grid | Starting task BETA_E225108_655_S.328.C42H26N6O1.BXDZVSAWISJAKK-UHFFFAOYSA-N.4_s1_14_0
18/08/2014 18:45:39 | World Community Grid | Starting task BETA_E225108_587_S.328.C42H26N6O1.JXTUBXMYSMVBOD-UHFFFAOYSA-N.14_s1_14_0

Then 2 finished somewhat early, with RC = 0x1 still in Job#0, and 2 others commenced:

18/08/2014 20:05:12 | World Community Grid | Computation for task BETA_E225108_588_S.328.C42H26N6O1.JXTUBXMYSMVBOD-UHFFFAOYSA-N.15_s1_14_0 finished
18/08/2014 20:05:29 | World Community Grid | Computation for task BETA_E225108_655_S.328.C42H26N6O1.BXDZVSAWISJAKK-UHFFFAOYSA-N.4_s1_14_0 finished
... uploads ...
18/08/2014 20:05:56 | World Community Grid | Starting task BETA_E225108_59_S.328.C41H25N7O1.JFONBYKDRGSVKP-UHFFFAOYSA-N.1_s1_14_0
18/08/2014 20:05:56 | World Community Grid | Starting task BETA_E225108_597_S.328.C42H26N6O1.SLVWIZBQTZWXHB-UHFFFAOYSA-N.4_s1_14_0
... uploads ...

Shortly afterwards, the 2 new units suffered this, simultaneously with another initial task finishing, again with RC=0x1 in Job#0:

18/08/2014 20:07:49 | World Community Grid | Task BETA_E225108_59_S.328.C41H25N7O1.JFONBYKDRGSVKP-UHFFFAOYSA-N.1_s1_14_0 exited with zero status but no 'finished' file
18/08/2014 20:07:49 | World Community Grid | If this happens repeatedly you may need to reset the project.
18/08/2014 20:07:49 | World Community Grid | Task BETA_E225108_597_S.328.C42H26N6O1.SLVWIZBQTZWXHB-UHFFFAOYSA-N.4_s1_14_0 exited with zero status but no 'finished' file
18/08/2014 20:07:49 | World Community Grid | If this happens repeatedly you may need to reset the project.
18/08/2014 20:07:49 | World Community Grid | Computation for task BETA_E225108_587_S.328.C42H26N6O1.JXTUBXMYSMVBOD-UHFFFAOYSA-N.14_s1_14_0 finished

A minute later, the last of the initial units suffered this, simultaneously with another unit starting:

18/08/2014 20:08:46 | World Community Grid | Task BETA_E225108_530_S.328.C44H29N5.ONXAAYOVMMZRSK-UHFFFAOYSA-N.14_s1_14_1 exited with zero status but no 'finished' file
18/08/2014 20:08:46 | World Community Grid | If this happens repeatedly you may need to reset the project.
18/08/2014 20:08:46 | World Community Grid | Starting task BETA_E225108_842_S.328.C40F3H23N2O3.RKUKECOGVQQHGS-UHFFFAOYSA-N.15_s1_14_0

Results Status shows this:
BETA_ E225108_ 587_ S.328.C42H26N6O1.JXTUBXMYSMVBOD-UHFFFAOYSA-N.14_ s1_ 14_ 0-- Tree3 Pending Validation 18/08/14 17:45:29 18/08/14 19:14:02 1.21 / 1.36 43.8 / 0.0
BETA_ E225108_ 588_ S.328.C42H26N6O1.JXTUBXMYSMVBOD-UHFFFAOYSA-N.15_ s1_ 14_ 0-- Tree3 Pending Validation 18/08/14 17:45:29 18/08/14 19:14:02 1.17 / 1.33 42.8 / 0.0
BETA_ E225108_ 655_ S.328.C42H26N6O1.BXDZVSAWISJAKK-UHFFFAOYSA-N.4_ s1_ 14_ 0-- Tree3 Pending Validation 18/08/14 17:45:29 18/08/14 19:14:02 1.17 / 1.33 42.8 / 0.0

What I find strange is that the system is currently processing _530_, _59_, _597_, _842_; the first 3 of those are the units that "exited with zero status but no 'finished' file", implying that they restarted. Is that normal??

So some of this may just be overloading of my system, but I've elaborated on it in case there's anything of value in reporting this behaviour.
[Aug 18, 2014 8:58:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 BETA test new workunits - Aug 15, 2014 [ Issues Thread ]

Tony,

so these work units are larger than a lot of the others. They all have over 300 electrons in them which make them harder to converge. Thus why you are seeing them exit sooner than you would expect.

The exited with zero status but no finish file may be something within the science code that exits the application with status 0, which is success. But does not write the boinc_finish file. This is not critical, just a bad warning and probably has to do with the path of exiting due to not converging.

Please note that I have the validator turned off right now as we await instructions from the researchers for a valid threshold for the convergence on validation.

Thanks,
-Uplinger
[Aug 18, 2014 9:04:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 118   Pages: 12   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread