Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 90
|
![]() |
Author |
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1317 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Had to cancel WU BETA_E236438_410_S.384.C35F2H10N6O5S4.MSQQQGDDEMBVSW-UHFFFAOYSA-N.1_s1_14 under Windows 8.1. Reason: Ignores Tthrottle's requests to slow down for not overheating the processor. Consequence: The beta WU runs full speed and the processor is 10 °C above the requested limit despite the other threads having been reduced to almost no activity. Interesting discovery, Jean. I tested it on my Win7-desktop and same behavior with TThrottle. It means that the wcgrid_beta11_qchem_prod_win32.exe.7.00 is not treated like a child process of the wcgrid_beta11_7.00_windows_intelx86 wrapper process. Manual adding the wcgrid_beta11_qchem_prod_win32.exe.7.00 process worked and the cpu usage was throttled. ![]() |
||
|
KLiK
Master Cruncher Croatia Joined: Nov 13, 2006 Post Count: 3108 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
still testing:
----------------------------------------BETA_ E236440_ 112_ S.458.C54H24N2S6.LEEBCYNHZUKVBD-UHFFFAOYSA-N.7_ s1_ 14_ 1-- T60 In Progress 3/16/16 23:44:09 3/20/16 23:44:09 0.00 / 0.00 0.0 / 0.0 BETA_ E236440_ 387_ S.448.C45F2H14N6O5S4.DSBOADLGQJABJM-UHFFFAOYSA-N.1_ s1_ 14_ 1-- DG33FB In Progress 3/16/16 23:15:55 3/20/16 23:15:55 0.00 / 0.00 0.0 / 0.0 BETA_ E236438_ 37_ S.392.C42H18N6O2S4.PQIVMVBSHAZUSL-UHFFFAOYSA-N.18_ s1_ 14_ 1-- VS4 In Progress 3/16/16 21:57:31 3/20/16 21:57:31 0.00 / 0.00 0.0 / 0.0 BETA_ E236438_ 309_ S.392.C44F2H18N4S4.LRMXUPUWGMMTBY-UHFFFAOYSA-N.10_ s1_ 14_ 0-- DP35DP In Progress 3/16/16 21:55:02 3/20/16 21:55:02 0.00 / 0.00 0.0 / 0.0 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The repair job I mentioned earlier has resulted in Valid for _0 and my _2, and Invalid for _1 that exited in Job #0:
BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 2-- Microsoft Windows 10 x64 Edition, (10.00.10586.00) 700 Valid 16/03/16 12:18:56 17/03/16 01:01:32 6.51 219.5 / 198.4 BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 1-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Invalid 15/03/16 20:29:39 16/03/16 03:56:25 0.86 24.4 / 24.4 BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 0-- Microsoft Windows 8.1 Professional x64 Edition, (06.03.9600.00) 700 Valid 15/03/16 20:29:36 16/03/16 12:18:51 6.86 177.2 / 198.4 My Result Log: Result Name: BETA_ E236437_ 292_ S.356.C33H12N6O1S6.DUKWITXLJHZVGL-UHFFFAOYSA-N.11_ s1_ 14_ 2-- <core_client_version>7.4.42</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [18:18:40] Number of jobs = 5 [18:18:40] Starting job 0,CPU time has been restored to 0.000000. [21:36:22] Finished Job #0 [21:36:22] Starting job 1,CPU time has been restored to 11561.703125. [22:40:22] Finished Job #1 [22:40:22] Starting job 2,CPU time has been restored to 15308.000000. [22:51:17] Finished Job #2 [22:51:17] Starting job 3,CPU time has been restored to 15944.328125. Application exited with RC = 0x1 [00:59:24] Finished Job #3 [00:59:24] Starting job 4,CPU time has been restored to 23421.171875. [00:59:24] Skipping Job #4 00:59:26 (5868): called boinc_finish </stderr_txt> I've now also received beta units from batches E236438, E236440 and E236441. |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Without trying had a 437 yesterday and overnight 3 more of 439-440 queued up behind some fresh FAHB that are in a hurry somehow. The betas are _0 and _1 copies. Will see what comes out.
----------------------------------------Edit in: BTW, the yesterday's ended in the predominant RC = 0x1 is still valid result. Result Name: BETA_ E236437_ 142_ S.358.C32H14N8S6.QNORDXTXHQLTHG-UHFFFAOYSA-N.12_ s1_ 14_ 0-- <core_client_version>7.6.2</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [13:25:00] Number of jobs = 5 [13:25:00] Starting job 0,CPU time has been restored to 0.000000. [17:27:26] Finished Job #0 [17:27:26] Starting job 1,CPU time has been restored to 14267.656250. [18:11:23] Finished Job #1 [18:11:23] Starting job 2,CPU time has been restored to 16883.156250. [18:25:55] Finished Job #2 [18:25:55] Starting job 3,CPU time has been restored to 17740.765625. Application exited with RC = 0x1 [21:19:30] Finished Job #3 [21:19:30] Starting job 4,CPU time has been restored to 28069.062500. [21:19:30] Skipping Job #4 21:19:33 (4552): called boinc_finish </stderr_txt> ]]> [Edit 1 times, last edit by SekeRob* at Mar 17, 2016 9:38:04 AM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
Just discovered the 4770 has all 8 cores 437-438 running, nothing else buffered though this is the OET dedicated machine [no special attempt to fetch Beta]. The buffer is set to 0.75 days, but all 8 beta have a TTC after 1:01 to 1:43 hours running of 1:16 days+ or 40 hours remaining. No wonder nothing is sitting in queue, which brings up a long standing issue. Everyone knows the things wont ever run longer than 18 hours. Cant this monkey be learned the trick to wear a cap that says "18 and no more we do, mama"? Can even think of a app_config solution to achieve this.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With my 4770K, I'm using app_config to run only 4 CEP2 Beta at a time, but that only prolongs the agony of overestimated Time to Completion, which is currently 47 hours for each Beta unit, versus actual completion times so far of 6 to 11 hours. I'm micro-managing the queue in order to keep some OET available (switching briefly at least twice per day to a profile for OET instead of Beta, and adjusting queue size to suit).
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My one WU checkpointed at about 4 hours. I suspended and restarted it twice (LIAM off) with no problem. The initial estimated time was about 6 hours. At the 4 hour run time it still showed 5 hours to finish. I ended about 45 minutes later after completing job 3.
----------------------------------------Ran on my IMac OSx 10.11.3 Result Log Result Name: BETA_ E236439_ 529_ S.420.C42H16N8S6.GIGTUVLDTLZZPV-UHFFFAOYSA-N.18_ s1_ 14_ 1-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [17:43:16] Number of jobs = 5 [17:43:16] Starting job 0,CPU time has been restored to 0.000000. [17:43:17] Starting new Job [17:43:17] Qink name = fldman [17:43:19] Qink name = gesman [17:43:20] Qink name = scfman [18:28:38] Qink name = anlman [18:28:38] Qink name = drvman [18:30:23] Qink name = optman [18:30:23] Qink name = fldman [18:30:23] Qink name = gesman [18:30:25] Qink name = scfman [18:41:46] Qink name = anlman [18:41:46] Qink name = drvman [18:43:29] Qink name = optman [18:43:30] Qink name = fldman [18:43:30] Qink name = gesman [18:43:31] Qink name = scfman [18:54:51] Qink name = anlman [18:54:51] Qink name = drvman [18:56:35] Qink name = optman [18:56:35] Qink name = fldman [18:56:35] Qink name = gesman [18:56:37] Qink name = scfman [19:07:00] Qink name = anlman [19:07:00] Qink name = drvman [19:08:44] Qink name = optman [19:08:44] Qink name = fldman [19:08:44] Qink name = gesman [19:08:46] Qink name = scfman [19:19:17] Qink name = anlman [19:19:17] Qink name = drvman [19:21:01] Qink name = optman [19:21:01] Qink name = fldman [19:21:01] Qink name = gesman [19:21:03] Qink name = scfman [19:30:43] Qink name = anlman [19:30:43] Qink name = drvman [19:32:27] Qink name = optman [19:32:27] Qink name = fldman [19:32:27] Qink name = gesman [19:32:29] Qink name = scfman [19:42:12] Qink name = anlman [19:42:12] Qink name = drvman [19:43:38] Qink name = optman [19:43:38] Qink name = fldman [19:43:38] Qink name = gesman [19:43:40] Qink name = scfman [19:53:20] Qink name = anlman [19:53:20] Qink name = drvman [19:55:06] Qink name = optman [19:55:07] Qink name = fldman [19:55:07] Qink name = gesman [19:55:09] Qink name = scfman [20:05:18] Qink name = anlman [20:05:18] Qink name = drvman [20:07:02] Qink name = optman [20:07:03] Qink name = fldman [20:07:03] Qink name = gesman [20:07:04] Qink name = scfman [20:17:43] Qink name = anlman [20:17:43] Qink name = drvman [20:19:26] Qink name = optman [20:19:27] Qink name = fldman [20:19:27] Qink name = gesman [20:19:29] Qink name = scfman [20:30:07] Qink name = anlman [20:30:07] Qink name = drvman [20:31:50] Qink name = optman [20:31:50] Qink name = fldman [20:31:50] Qink name = gesman [20:31:52] Qink name = scfman [20:41:26] Qink name = anlman [20:41:26] Qink name = drvman [20:43:09] Qink name = optman [20:43:09] Qink name = fldman [20:43:09] Qink name = gesman [20:43:11] Qink name = scfman [20:52:34] Qink name = anlman [20:52:34] Qink name = drvman [20:54:17] Qink name = optman [20:54:17] Qink name = fldman [20:54:17] Qink name = gesman [20:54:19] Qink name = scfman [21:02:56] Qink name = anlman [21:02:56] Qink name = drvman [21:04:39] Qink name = optman [21:04:39] Qink name = fldman [21:04:39] Qink name = gesman [21:04:41] Qink name = scfman [21:12:44] Qink name = anlman [21:12:44] Qink name = drvman [21:14:27] Qink name = optman [21:14:27] Qink name = fldman [21:14:27] Qink name = gesman [21:14:29] Qink name = scfman [21:21:38] Qink name = anlman [21:21:38] Qink name = drvman [21:23:21] Qink name = optman [21:23:21] Qink name = anlman [21:31:31] End of Job [21:31:38] Finished Job #0 [21:31:38] Starting job 1,CPU time has been restored to 13113.609595. [21:31:41] Starting new Job [21:31:41] Qink name = fldman [21:31:43] Qink name = gesman [21:31:43] Qink name = scfman [21:44:46] Qink name = anlman [21:52:22] End of Job [21:52:29] Finished Job #1 [21:52:29] Starting job 2,CPU time has been restored to 14333.385774. [21:52:32] Starting new Job [21:52:33] Qink name = fldman [21:52:34] Qink name = gesman [21:52:34] Qink name = scfman [22:03:11] Qink name = anlman [22:11:15] End of Job [22:11:22] Finished Job #2 [22:11:22] Starting job 3,CPU time has been restored to 15431.434208. [22:11:25] Starting new Job [22:11:26] Qink name = fldman [22:11:35] Qink name = gesman [22:11:37] Qink name = scfman Quit requested: Exiting [22:12:47] Number of jobs = 5 [22:12:47] Starting job 3,CPU time has been restored to 15431.434208. [22:12:51] Starting new Job [22:12:51] Qink name = fldman [22:13:00] Qink name = gesman [22:13:02] Qink name = scfman Quit requested: Exiting [22:29:53] Number of jobs = 5 [22:29:53] Starting job 3,CPU time has been restored to 15431.434208. [22:29:57] Starting new Job [22:29:57] Qink name = fldman [22:30:05] Qink name = gesman [22:30:06] Qink name = scfman Application exited with RC = 0x100 [22:57:55] Finished Job #3 [22:57:55] Starting job 4,CPU time has been restored to 17070.930537. [22:57:55] Skipping Job #4 called boinc_finish </stderr_txt> ]]> [Edit 1 times, last edit by Former Member at Mar 17, 2016 10:48:30 AM] |
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
If anything is an indicator, the slot info says the app is v7.00, the same as used in production. Don't think versioning is easily compromised with the way development works.
wcgrid_beta11_7.00_windows_intelx86 Got 8 concurrent running on the W8.1-64/4770 and showing 96.9-97.4% efficiency after 2:09 to 2:51 hours. That's all without any checkpoint having recorded, so it could plummet when those start happening [it did a few weeks ago on my Linux with 4 concurrent. All fine and dandy >98% efficiency, awstruck with the just upgraded 4.2.1 LTS kernel, but then it fell rapidly and all finishing 92-93%... which was the normal of old] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For another data point, my W10-64/4770 running just 4 at a time is giving 96% or 97% efficiency.
|
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2104 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Did a test with suspension of a BETA WU (LAIM on).
----------------------------------------BETA_E236437_655_S.364.C42H20N4S4.MZVSYBDUMBXWRY-UHFFFAOYSA-N.19_s1_14_1-- Suspended the WU in Qink 'anlman' in Job#2; when the WU restarted it started at first Qink 'fldman' in Job#2. ... [10:50:34] Finished Job #0 [10:50:34] Starting job 1,CPU time has been restored to 12290.427041. [10:50:35] Starting new Job [10:50:35] Qink name = fldman [10:50:37] Qink name = gesman [10:50:37] Qink name = scfman [11:07:23] Qink name = anlman [11:09:46] End of Job [11:09:48] Finished Job #1 [11:09:48] Starting job 2,CPU time has been restored to 13063.344679. [11:09:50] Starting new Job [11:09:50] Qink name = fldman [11:09:51] Qink name = gesman [11:09:51] Qink name = scfman [11:25:09] Qink name = anlman Quit requested: Exiting [11:29:45] Number of jobs = 5 [11:29:45] Starting job 2,CPU time has been restored to 13063.344679. [11:29:46] Starting new Job [11:29:46] Qink name = fldman [11:29:46] Qink name = gesman [11:29:47] Qink name = scfman [11:45:03] Qink name = anlman [11:47:41] End of Job [11:47:42] Finished Job #2 [11:47:42] Starting job 3,CPU time has been restored to 13748.127500. [11:47:43] Starting new Job [11:47:44] Qink name = fldman Application exited with RC = 0x100 [11:48:04] Finished Job #3 [11:48:04] Starting job 4,CPU time has been restored to 13760.045754. [11:48:04] Skipping Job #4 11:48:06 (3327): called boinc_finish BETA_E236437_655_S.364.C42H20N4S4.MZVSYBDUMBXWRY-UHFFFAOYSA-N.19_s1_14_1-- Linux 4.3.3-303.fc23.x86_64 [Edit 1 times, last edit by adriverhoef at Mar 17, 2016 12:17:53 PM] |
||
|
|
![]() |