Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 90
Posts: 90   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 8511 times and has 89 replies Next Thread
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

Had to cancel WU
BETA_E236438_410_S.384.C35F2H10N6O5S4.MSQQQGDDEMBVSW-UHFFFAOYSA-N.1_s1_14
under Windows 8.1.

Reason: Ignores Tthrottle's requests to slow down for not overheating the processor.
Consequence: The beta WU runs full speed and the processor is 10 °C above the requested limit despite the other threads having been reduced to almost no activity.

Interesting discovery, Jean.

I tested it on my Win7-desktop and same behavior with TThrottle.
It means that the wcgrid_beta11_qchem_prod_win32.exe.7.00 is not treated like a child process of the wcgrid_beta11_7.00_windows_intelx86 wrapper process.
Manual adding the wcgrid_beta11_qchem_prod_win32.exe.7.00 process worked and the cpu usage was throttled.
----------------------------------------

[Mar 17, 2016 6:35:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KLiK
Master Cruncher
Croatia
Joined: Nov 13, 2006
Post Count: 3108
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

still testing:
BETA_ E236440_ 112_ S.458.C54H24N2S6.LEEBCYNHZUKVBD-UHFFFAOYSA-N.7_ s1_ 14_ 1-- T60 In Progress 3/16/16 23:44:09 3/20/16 23:44:09 0.00 / 0.00 0.0 / 0.0
BETA_ E236440_ 387_ S.448.C45F2H14N6O5S4.DSBOADLGQJABJM-UHFFFAOYSA-N.1_ s1_ 14_ 1-- DG33FB In Progress 3/16/16 23:15:55 3/20/16 23:15:55 0.00 / 0.00 0.0 / 0.0
BETA_ E236438_ 37_ S.392.C42H18N6O2S4.PQIVMVBSHAZUSL-UHFFFAOYSA-N.18_ s1_ 14_ 1-- VS4 In Progress 3/16/16 21:57:31 3/20/16 21:57:31 0.00 / 0.00 0.0 / 0.0
BETA_ E236438_ 309_ S.392.C44F2H18N4S4.LRMXUPUWGMMTBY-UHFFFAOYSA-N.10_ s1_ 14_ 0-- DP35DP In Progress 3/16/16 21:55:02 3/20/16 21:55:02 0.00 / 0.00 0.0 / 0.0
----------------------------------------
oldies:UDgrid.org & PS3 Life@home


non-profit org. Play4Life in Zagreb, Croatia
[Mar 17, 2016 7:16:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

The repair job I mentioned earlier has resulted in Valid for _0 and my _2, and Invalid for _1 that exited in Job #0:

BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 2-- Microsoft Windows 10 x64 Edition, (10.00.10586.00) 700 Valid 16/03/16 12:18:56 17/03/16 01:01:32 6.51 219.5 / 198.4
BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 1-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Invalid 15/03/16 20:29:39 16/03/16 03:56:25 0.86 24.4 / 24.4
BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 0-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Valid 15/03/16 20:29:36 16/03/16 12:18:51 6.86 177.2 / 198.4

My Result Log:

Result Name: BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 2--
<core_client_version>7.4.42</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[18:18:40] Number of jobs = 5
[18:18:40] Starting job 0,CPU time has been restored to 0.000000.
[21:36:22] Finished Job #0
[21:36:22] Starting job 1,CPU time has been restored to 11561.703125.
[22:40:22] Finished Job #1
[22:40:22] Starting job 2,CPU time has been restored to 15308.000000.
[22:51:17] Finished Job #2
[22:51:17] Starting job 3,CPU time has been restored to 15944.328125.
Application exited with RC = 0x1
[00:59:24] Finished Job #3
[00:59:24] Starting job 4,CPU time has been restored to 23421.171875.
[00:59:24] Skipping Job #4
00:59:26 (5868): called boinc_finish
</stderr_txt>

I've now also received beta units from batches E236438, E236440 and E236441.
[Mar 17, 2016 7:57:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

Without trying had a 437 yesterday and overnight 3 more of 439-440 queued up behind some fresh FAHB that are in a hurry somehow. The betas are _0 and _1 copies. Will see what comes out.

Edit in: BTW, the yesterday's ended in the predominant RC = 0x1 is still valid result.

Result Name: BETA_ E236437_ 142_ S.358.C32H14N8S6.QNORDXTXHQLTHG-UHFFFAOYSA-N.12_ s1_ 14_ 0--
<core_client_version>7.6.2</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[13:25:00] Number of jobs = 5
[13:25:00] Starting job 0,CPU time has been restored to 0.000000.
[17:27:26] Finished Job #0
[17:27:26] Starting job 1,CPU time has been restored to 14267.656250.
[18:11:23] Finished Job #1
[18:11:23] Starting job 2,CPU time has been restored to 16883.156250.
[18:25:55] Finished Job #2
[18:25:55] Starting job 3,CPU time has been restored to 17740.765625.
Application exited with RC = 0x1
[21:19:30] Finished Job #3
[21:19:30] Starting job 4,CPU time has been restored to 28069.062500.
[21:19:30] Skipping Job #4
21:19:33 (4552): called boinc_finish

</stderr_txt>
]]>
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Mar 17, 2016 9:38:04 AM]
[Mar 17, 2016 9:29:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

Just discovered the 4770 has all 8 cores 437-438 running, nothing else buffered though this is the OET dedicated machine [no special attempt to fetch Beta]. The buffer is set to 0.75 days, but all 8 beta have a TTC after 1:01 to 1:43 hours running of 1:16 days+ or 40 hours remaining. No wonder nothing is sitting in queue, which brings up a long standing issue. Everyone knows the things wont ever run longer than 18 hours. Cant this monkey be learned the trick to wear a cap that says "18 and no more we do, mama"? Can even think of a app_config solution to achieve this.
[Mar 17, 2016 10:02:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

With my 4770K, I'm using app_config to run only 4 CEP2 Beta at a time, but that only prolongs the agony of overestimated Time to Completion, which is currently 47 hours for each Beta unit, versus actual completion times so far of 6 to 11 hours. I'm micro-managing the queue in order to keep some OET available (switching briefly at least twice per day to a profile for OET instead of Beta, and adjusting queue size to suit).
[Mar 17, 2016 10:30:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

My one WU checkpointed at about 4 hours. I suspended and restarted it twice (LIAM off) with no problem. The initial estimated time was about 6 hours. At the 4 hour run time it still showed 5 hours to finish. I ended about 45 minutes later after completing job 3.

Ran on my IMac OSx 10.11.3

Result Log

Result Name: BETA_ E236439_ 529_ S.420.C42H16N8S6.GIGTUVLDTLZZPV-UHFFFAOYSA-N.18_ s1_ 14_ 1--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[17:43:16] Number of jobs = 5
[17:43:16] Starting job 0,CPU time has been restored to 0.000000.
[17:43:17] Starting new Job
[17:43:17] Qink name = fldman
[17:43:19] Qink name = gesman
[17:43:20] Qink name = scfman
[18:28:38] Qink name = anlman
[18:28:38] Qink name = drvman
[18:30:23] Qink name = optman
[18:30:23] Qink name = fldman
[18:30:23] Qink name = gesman
[18:30:25] Qink name = scfman
[18:41:46] Qink name = anlman
[18:41:46] Qink name = drvman
[18:43:29] Qink name = optman
[18:43:30] Qink name = fldman
[18:43:30] Qink name = gesman
[18:43:31] Qink name = scfman
[18:54:51] Qink name = anlman
[18:54:51] Qink name = drvman
[18:56:35] Qink name = optman
[18:56:35] Qink name = fldman
[18:56:35] Qink name = gesman
[18:56:37] Qink name = scfman
[19:07:00] Qink name = anlman
[19:07:00] Qink name = drvman
[19:08:44] Qink name = optman
[19:08:44] Qink name = fldman
[19:08:44] Qink name = gesman
[19:08:46] Qink name = scfman
[19:19:17] Qink name = anlman
[19:19:17] Qink name = drvman
[19:21:01] Qink name = optman
[19:21:01] Qink name = fldman
[19:21:01] Qink name = gesman
[19:21:03] Qink name = scfman
[19:30:43] Qink name = anlman
[19:30:43] Qink name = drvman
[19:32:27] Qink name = optman
[19:32:27] Qink name = fldman
[19:32:27] Qink name = gesman
[19:32:29] Qink name = scfman
[19:42:12] Qink name = anlman
[19:42:12] Qink name = drvman
[19:43:38] Qink name = optman
[19:43:38] Qink name = fldman
[19:43:38] Qink name = gesman
[19:43:40] Qink name = scfman
[19:53:20] Qink name = anlman
[19:53:20] Qink name = drvman
[19:55:06] Qink name = optman
[19:55:07] Qink name = fldman
[19:55:07] Qink name = gesman
[19:55:09] Qink name = scfman
[20:05:18] Qink name = anlman
[20:05:18] Qink name = drvman
[20:07:02] Qink name = optman
[20:07:03] Qink name = fldman
[20:07:03] Qink name = gesman
[20:07:04] Qink name = scfman
[20:17:43] Qink name = anlman
[20:17:43] Qink name = drvman
[20:19:26] Qink name = optman
[20:19:27] Qink name = fldman
[20:19:27] Qink name = gesman
[20:19:29] Qink name = scfman
[20:30:07] Qink name = anlman
[20:30:07] Qink name = drvman
[20:31:50] Qink name = optman
[20:31:50] Qink name = fldman
[20:31:50] Qink name = gesman
[20:31:52] Qink name = scfman
[20:41:26] Qink name = anlman
[20:41:26] Qink name = drvman
[20:43:09] Qink name = optman
[20:43:09] Qink name = fldman
[20:43:09] Qink name = gesman
[20:43:11] Qink name = scfman
[20:52:34] Qink name = anlman
[20:52:34] Qink name = drvman
[20:54:17] Qink name = optman
[20:54:17] Qink name = fldman
[20:54:17] Qink name = gesman
[20:54:19] Qink name = scfman
[21:02:56] Qink name = anlman
[21:02:56] Qink name = drvman
[21:04:39] Qink name = optman
[21:04:39] Qink name = fldman
[21:04:39] Qink name = gesman
[21:04:41] Qink name = scfman
[21:12:44] Qink name = anlman
[21:12:44] Qink name = drvman
[21:14:27] Qink name = optman
[21:14:27] Qink name = fldman
[21:14:27] Qink name = gesman
[21:14:29] Qink name = scfman
[21:21:38] Qink name = anlman
[21:21:38] Qink name = drvman
[21:23:21] Qink name = optman
[21:23:21] Qink name = anlman
[21:31:31] End of Job
[21:31:38] Finished Job #0
[21:31:38] Starting job 1,CPU time has been restored to 13113.609595.
[21:31:41] Starting new Job
[21:31:41] Qink name = fldman
[21:31:43] Qink name = gesman
[21:31:43] Qink name = scfman
[21:44:46] Qink name = anlman
[21:52:22] End of Job
[21:52:29] Finished Job #1
[21:52:29] Starting job 2,CPU time has been restored to 14333.385774.
[21:52:32] Starting new Job
[21:52:33] Qink name = fldman
[21:52:34] Qink name = gesman
[21:52:34] Qink name = scfman
[22:03:11] Qink name = anlman
[22:11:15] End of Job
[22:11:22] Finished Job #2
[22:11:22] Starting job 3,CPU time has been restored to 15431.434208.
[22:11:25] Starting new Job
[22:11:26] Qink name = fldman
[22:11:35] Qink name = gesman
[22:11:37] Qink name = scfman
Quit requested: Exiting
[22:12:47] Number of jobs = 5
[22:12:47] Starting job 3,CPU time has been restored to 15431.434208.
[22:12:51] Starting new Job
[22:12:51] Qink name = fldman
[22:13:00] Qink name = gesman
[22:13:02] Qink name = scfman
Quit requested: Exiting
[22:29:53] Number of jobs = 5
[22:29:53] Starting job 3,CPU time has been restored to 15431.434208.
[22:29:57] Starting new Job
[22:29:57] Qink name = fldman
[22:30:05] Qink name = gesman
[22:30:06] Qink name = scfman
Application exited with RC = 0x100
[22:57:55] Finished Job #3
[22:57:55] Starting job 4,CPU time has been restored to 17070.930537.
[22:57:55] Skipping Job #4
called boinc_finish

</stderr_txt>
]]>
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 17, 2016 10:48:30 AM]
[Mar 17, 2016 10:44:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

If anything is an indicator, the slot info says the app is v7.00, the same as used in production. Don't think versioning is easily compromised with the way development works.

wcgrid_beta11_7.00_windows_intelx86

Got 8 concurrent running on the W8.1-64/4770 and showing 96.9-97.4% efficiency after 2:09 to 2:51 hours. That's all without any checkpoint having recorded, so it could plummet when those start happening [it did a few weeks ago on my Linux with 4 concurrent. All fine and dandy >98% efficiency, awstruck with the just upgraded 4.2.1 LTS kernel, but then it fell rapidly and all finishing 92-93%... which was the normal of old]
[Mar 17, 2016 11:02:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

For another data point, my W10-64/4770 running just 4 at a time is giving 96% or 97% efficiency.
[Mar 17, 2016 11:41:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2104
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Clean Energy Project - Phase 2 Beta March 15, 2016 [Issues Thread]

Did a test with suspension of a BETA WU (LAIM on).

BETA_E236437_655_S.364.C42H20N4S4.MZVSYBDUMBXWRY-UHFFFAOYSA-N.19_s1_14_1--

Suspended the WU in Qink 'anlman' in Job#2; when the WU restarted it started at first Qink 'fldman' in Job#2.
...
[10:50:34] Finished Job #0
[10:50:34] Starting job 1,CPU time has been restored to 12290.427041.
[10:50:35] Starting new Job
[10:50:35] Qink name = fldman
[10:50:37] Qink name = gesman
[10:50:37] Qink name = scfman
[11:07:23] Qink name = anlman
[11:09:46] End of Job
[11:09:48] Finished Job #1
[11:09:48] Starting job 2,CPU time has been restored to 13063.344679.
[11:09:50] Starting new Job
[11:09:50] Qink name = fldman
[11:09:51] Qink name = gesman
[11:09:51] Qink name = scfman
[11:25:09] Qink name = anlman
Quit requested: Exiting
[11:29:45] Number of jobs = 5
[11:29:45] Starting job 2,CPU time has been restored to 13063.344679.
[11:29:46] Starting new Job
[11:29:46] Qink name = fldman
[11:29:46] Qink name = gesman
[11:29:47] Qink name = scfman
[11:45:03] Qink name = anlman
[11:47:41] End of Job
[11:47:42] Finished Job #2
[11:47:42] Starting job 3,CPU time has been restored to 13748.127500.
[11:47:43] Starting new Job
[11:47:44] Qink name = fldman
Application exited with RC = 0x100
[11:48:04] Finished Job #3
[11:48:04] Starting job 4,CPU time has been restored to 13760.045754.
[11:48:04] Skipping Job #4
11:48:06 (3327): called boinc_finish
BETA_E236437_655_S.364.C42H20N4S4.MZVSYBDUMBXWRY-UHFFFAOYSA-N.19_s1_14_1-- Linux 	4.3.3-303.fc23.x86_64
700 Pending Validation 3/16/16 20:48:41 3/17/16 11:36:50 3.82 137.4 / 0.0

----------------------------------------
[Edit 1 times, last edit by adriverhoef at Mar 17, 2016 12:17:53 PM]
[Mar 17, 2016 12:16:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 90   Pages: 9   [ Previous Page | 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread