| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 23
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The latest _4 result upload files are now in the 54Mb+ range and per title seeing the interspersed 'error'. Similar to the RC=0 error aborts, can't see this as a host originating problem and propose these are treated with credit for time. Here's one such result quorum:
E216403_ 032_ I.40.C30H16N4O6.00170204.0.set1d06_ 2-- 640 Valid 10/22/13 10:18:36 10/24/13 00:27:33 4.77 299.0 / 281.2 E216403_ 032_ I.40.C30H16N4O6.00170204.0.set1d06_ 1-- 640 Error 10/20/13 17:37:01 10/22/13 09:55:27 8.69 281.7 / 0.0 E216403_ 032_ I.40.C30H16N4O6.00170204.0.set1d06_ 0-- 640 Valid 10/20/13 17:22:05 10/21/13 16:48:40 4.39 263.4 / 281.2 [10:34:16] Finished Job #11 [10:34:16] Starting job 12,CPU time has been restored to 26604.280000. [10:34:17] Starting new Job [10:34:17] Qink name = fldman [10:34:26] Qink name = gesman [10:34:27] Qink name = scfman [11:34:22] Qink name = anlman Abort requested: Exiting </stderr_txt> ]]> |
||
|
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges:
|
Yes, got one of them on Mac OS X Mavericks, 7.0.65, i5-2500S @2.70 GHz
----------------------------------------E216435_ 630_ I.41.C33H14N6OS.00280024.0.set1d06_ 2-- - In Progress 23.10.2013 11:53:18 2.11.2013 11:53:18 0.00 0.0 / 0.0 E216435_ 630_ I.41.C33H14N6OS.00280024.0.set1d06_ 1-- 640 Error 22.10.2013 04:16:10 23.10.2013 11:43:05 8.96 313.0 / 0.0 E216435_ 630_ I.41.C33H14N6OS.00280024.0.set1d06_ 0-- 640 Pending Validation 22.10.2013 03:26:34 23.10.2013 23:19:14 11.19 421.7 / 0.0 [12:03:49] Finished Job #11 [12:03:49] Starting job 12,CPU time has been restored to 26906.389554. [12:03:51] Starting new Job [12:03:51] Qink name = fldman [12:03:57] Qink name = gesman [12:03:58] Qink name = scfman [13:17:31] Qink name = anlman Abort requested: Exiting </stderr_txt> Cheers ![]() ![]() Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 ![]() |
||
|
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 397 Status: Offline Project Badges:
|
I've been getting the same errors on a i5-650 Ubuntu 12.04 box with a 250GB hard drive. Was running four units with no problems until these big ones arrived. Cut back to one task per host and the errors went away.
----------------------------------------UPDATE: Increased Maxumum Disk Usage setting to 20GB. Maybe this will fix the problem? Result Name: E216438_ 989_ I.41.C34H14N6S.00218872.1.set1d06_ 0-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [02:12:41] Number of jobs = 16 [02:12:41] Starting job 0,CPU time has been restored to 0.000000. [02:12:41] Starting new Job [02:12:41] Qink name = fldman [02:12:41] Qink name = gesman [02:12:41] Qink name = scfman [02:14:49] Qink name = anlman [02:14:51] End of Job [02:14:52] Finished Job #0 [02:14:52] Starting job 1,CPU time has been restored to 122.723669. [02:14:52] Starting new Job [02:14:52] Qink name = fldman [02:14:52] Qink name = gesman [02:14:52] Qink name = scfman [02:21:27] Qink name = anlman [02:23:31] End of Job [02:23:32] Finished Job #1 [02:23:32] Starting job 2,CPU time has been restored to 621.178820. [02:23:32] Starting new Job [02:23:32] Qink name = fldman [02:23:33] Qink name = gesman [02:23:33] Qink name = scfman </stderr_txt> ]]> Result Name: E216383_ 303_ I.40.C31F6H15N3.00203017.2.set1d06_ 0-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [16:42:50] Number of jobs = 16 [16:42:50] Starting job 0,CPU time has been restored to 0.000000. [16:42:50] Starting new Job [16:42:50] Qink name = fldman [16:42:50] Qink name = gesman [16:42:50] Qink name = scfman [16:46:03] Qink name = anlman [16:46:07] End of Job [16:46:08] Finished Job #0 [16:46:08] Starting job 1,CPU time has been restored to 182.935432. [16:46:08] Starting new Job [16:46:08] Qink name = fldman [16:46:09] Qink name = gesman [16:46:09] Qink name = scfman [16:56:10] Qink name = anlman [16:59:07] End of Job [16:59:09] Finished Job #1 [16:59:09] Starting job 2,CPU time has been restored to 929.882113. [16:59:09] Starting new Job [16:59:09] Qink name = fldman [16:59:10] Qink name = gesman [16:59:10] Qink name = scfman [17:07:34] Qink name = anlman [17:07:34] Qink name = drvman [17:09:30] Qink name = optman [17:09:30] Qink name = fldman [17:09:30] Qink name = gesman [17:09:31] Qink name = scfman [17:22:21] Qink name = anlman [17:22:21] Qink name = drvman [17:24:15] Qink name = optman [17:24:15] Qink name = fldman [17:24:15] Qink name = gesman [17:24:16] Qink name = scfman [17:37:00] Qink name = anlman [17:37:00] Qink name = drvman [17:38:53] Qink name = optman [17:38:53] Qink name = fldman [17:38:53] Qink name = gesman [17:38:54] Qink name = scfman [17:50:55] Qink name = anlman [17:50:55] Qink name = drvman [17:52:48] Qink name = optman [17:52:48] Qink name = fldman [17:52:48] Qink name = gesman [17:52:50] Qink name = scfman [18:04:31] Qink name = anlman [18:04:31] Qink name = drvman [18:06:23] Qink name = optman [18:06:24] Qink name = fldman [18:06:24] Qink name = gesman [18:06:25] Qink name = scfman [18:18:03] Qink name = anlman [18:18:03] Qink name = drvman [18:19:56] Qink name = optman [18:19:56] Qink name = fldman [18:19:56] Qink name = gesman [18:19:57] Qink name = scfman [18:31:05] Qink name = anlman [18:31:05] Qink name = drvman [18:32:58] Qink name = optman [18:32:59] Qink name = fldman [18:32:59] Qink name = gesman [18:33:00] Qink name = scfman [18:44:34] Qink name = anlman [18:44:34] Qink name = drvman [18:46:24] Qink name = optman [18:46:24] Qink name = fldman [18:46:24] Qink name = gesman [18:46:25] Qink name = scfman Parent was killed, exiting </stderr_txt> ]]> Result Name: E216374_ 172_ I.39.C29H17N5O5.00359195.4.set1d06_ 1-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [07:52:29] Number of jobs = 16 [07:52:29] Starting job 0,CPU time has been restored to 0.000000. [07:52:29] Starting new Job [07:52:30] Qink name = fldman [07:52:30] Qink name = gesman [07:52:30] Qink name = scfman [08:03:31] Qink name = anlman [08:03:35] End of Job [08:03:36] Finished Job #0 [08:03:36] Starting job 1,CPU time has been restored to 621.494841. [08:03:36] Starting new Job [08:03:36] Qink name = fldman [08:03:37] Qink name = gesman [08:03:38] Qink name = scfman [08:20:58] Qink name = anlman [08:23:30] End of Job [08:23:32] Finished Job #1 [08:23:32] Starting job 2,CPU time has been restored to 1756.185754. [08:23:32] Starting new Job [08:23:32] Qink name = fldman [08:23:33] Qink name = gesman [08:23:33] Qink name = scfman [08:33:21] Qink name = anlman [08:33:21] Qink name = drvman [08:35:32] Qink name = optman [08:35:32] Qink name = fldman [08:35:32] Qink name = gesman [08:35:34] Qink name = scfman [08:58:00] Qink name = anlman [08:58:00] Qink name = drvman [09:00:10] Qink name = optman [09:00:11] Qink name = fldman [09:00:11] Qink name = gesman [09:00:12] Qink name = scfman [09:18:52] Qink name = anlman [09:18:52] Qink name = drvman [09:20:59] Qink name = optman [09:20:59] Qink name = fldman [09:20:59] Qink name = gesman [09:21:00] Qink name = scfman [09:38:20] Qink name = anlman [09:38:20] Qink name = drvman [09:40:28] Qink name = optman [09:40:28] Qink name = fldman [09:40:28] Qink name = gesman [09:40:29] Qink name = scfman [09:56:40] Qink name = anlman [09:56:40] Qink name = drvman [09:58:45] Qink name = optman [09:58:45] Qink name = fldman [09:58:45] Qink name = gesman [09:58:47] Qink name = scfman [10:13:11] Qink name = anlman [10:13:11] Qink name = drvman [10:15:15] Qink name = optman [10:15:15] Qink name = fldman [10:15:15] Qink name = gesman [10:15:17] Qink name = scfman [10:28:45] Qink name = anlman [10:28:45] Qink name = drvman [10:30:49] Qink name = optman [10:30:49] Qink name = fldman [10:30:49] Qink name = gesman [10:30:50] Qink name = scfman [10:44:22] Qink name = anlman [10:44:22] Qink name = drvman [10:46:27] Qink name = optman [10:46:27] Qink name = fldman [10:46:27] Qink name = gesman [10:46:28] Qink name = scfman [11:00:25] Qink name = anlman [11:00:25] Qink name = drvman [11:02:30] Qink name = optman [11:02:30] Qink name = fldman [11:02:30] Qink name = gesman [11:02:31] Qink name = scfman [11:16:17] Qink name = anlman [11:16:17] Qink name = drvman [11:18:23] Qink name = optman [11:18:23] Qink name = fldman [11:18:23] Qink name = gesman [11:18:24] Qink name = scfman [11:32:13] Qink name = anlman [11:32:13] Qink name = drvman [11:34:18] Qink name = optman [11:34:18] Qink name = fldman [11:34:18] Qink name = gesman [11:34:20] Qink name = scfman [11:49:36] Qink name = anlman [11:49:36] Qink name = drvman [11:51:34] Qink name = optman [11:51:34] Qink name = fldman [11:51:34] Qink name = gesman [11:51:35] Qink name = scfman [12:06:42] Qink name = anlman [12:06:43] Qink name = drvman [12:09:07] Qink name = optman [12:09:07] Qink name = fldman [12:09:07] Qink name = gesman [12:09:08] Qink name = scfman [12:22:46] Qink name = anlman [12:22:46] Qink name = drvman [12:25:03] Qink name = optman [12:25:03] Qink name = fldman [12:25:03] Qink name = gesman [12:25:04] Qink name = scfman [12:36:59] Qink name = anlman [12:36:59] Qink name = drvman [12:39:06] Qink name = optman [12:39:06] Qink name = fldman [12:39:06] Qink name = gesman [12:39:08] Qink name = scfman [12:50:45] Qink name = anlman [12:50:45] Qink name = drvman [12:52:54] Qink name = optman [12:52:54] Qink name = fldman [12:52:54] Qink name = gesman [12:52:55] Qink name = scfman [13:03:43] Qink name = anlman [13:03:43] Qink name = drvman [13:05:49] Qink name = optman [13:05:50] Qink name = fldman [13:05:50] Qink name = gesman [13:05:51] Qink name = scfman [13:16:24] Qink name = anlman [13:16:24] Qink name = drvman [13:18:33] Qink name = optman [13:18:33] Qink name = fldman [13:18:33] Qink name = gesman [13:18:34] Qink name = scfman [13:28:35] Qink name = anlman [13:28:35] Qink name = drvman [13:30:43] Qink name = optman [13:30:43] Qink name = anlman [13:34:51] End of Job [13:34:52] Finished Job #2 [13:34:52] Starting job 3,CPU time has been restored to 19421.905792. [13:34:52] Starting new Job [13:34:53] Qink name = fldman [13:34:54] Qink name = gesman [13:34:54] Qink name = scfman [13:48:46] Qink name = anlman [13:52:44] End of Job [13:52:45] Finished Job #3 [13:52:45] Starting job 4,CPU time has been restored to 20445.409757. [13:52:45] Starting new Job [13:52:45] Qink name = fldman [13:52:47] Qink name = gesman [13:52:47] Qink name = scfman [14:02:45] Qink name = anlman [14:06:10] End of Job [14:06:10] Finished Job #4 [14:06:10] Starting job 5,CPU time has been restored to 21232.926973. [14:06:10] Starting new Job [14:06:11] Qink name = fldman [14:06:12] Qink name = gesman [14:06:12] Qink name = scfman [14:17:08] Qink name = anlman [14:20:26] End of Job [14:20:27] Finished Job #5 [14:20:27] Starting job 6,CPU time has been restored to 22065.278991. [14:20:27] Starting new Job [14:20:27] Qink name = fldman [14:20:28] Qink name = gesman [14:20:28] Qink name = scfman [14:30:26] Qink name = anlman [14:34:42] End of Job [14:34:43] Finished Job #6 [14:34:43] Starting job 7,CPU time has been restored to 22903.227359. [14:34:43] Starting new Job [14:34:43] Qink name = fldman [14:34:44] Qink name = gesman [14:34:44] Qink name = scfman [14:49:19] Qink name = anlman [14:52:28] End of Job [14:52:30] Finished Job #7 [14:52:30] Starting job 8,CPU time has been restored to 23944.636443. [14:52:30] Starting new Job [14:52:30] Qink name = fldman [14:52:31] Qink name = gesman [14:52:31] Qink name = scfman [15:01:24] Qink name = anlman [15:04:42] End of Job [15:04:43] Finished Job #8 [15:04:43] Starting job 9,CPU time has been restored to 24669.797762. [15:04:43] Starting new Job [15:04:43] Qink name = fldman [15:04:44] Qink name = gesman [15:04:44] Qink name = scfman [15:14:57] Qink name = anlman [15:19:14] End of Job [15:19:15] Finished Job #9 [15:19:15] Starting job 10,CPU time has been restored to 25534.651812. [15:19:15] Starting new Job [15:19:15] Qink name = fldman [15:19:16] Qink name = gesman [15:19:16] Qink name = scfman [15:44:40] Qink name = anlman [15:48:39] End of Job [15:48:40] Finished Job #10 [15:48:40] Starting job 11,CPU time has been restored to 27268.048142. [15:48:40] Starting new Job [15:48:40] Qink name = fldman [15:48:41] Qink name = gesman [15:48:41] Qink name = scfman [16:02:07] Qink name = anlman [16:06:14] End of Job [16:06:15] Finished Job #11 [16:06:15] Starting job 12,CPU time has been restored to 28298.652550. [16:06:15] Starting new Job [16:06:15] Qink name = fldman [16:06:22] Qink name = gesman [16:06:23] Qink name = scfman </stderr_txt> ]]> Result Name: E216350_ 668_ I.40.C29F4H13N3O4.00408283.2.set1d06_ 0-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [06:25:03] Number of jobs = 16 [06:25:03] Starting job 0,CPU time has been restored to 0.000000. [06:25:03] Starting new Job [06:25:03] Qink name = fldman [06:25:03] Qink name = gesman [06:25:03] Qink name = scfman [06:29:23] Qink name = anlman [06:29:27] End of Job [06:29:28] Finished Job #0 [06:29:28] Starting job 1,CPU time has been restored to 248.831551. [06:29:28] Starting new Job [06:29:28] Qink name = fldman [06:29:29] Qink name = gesman [06:29:29] Qink name = scfman [06:43:28] Qink name = anlman [06:45:58] End of Job [06:45:59] Finished Job #1 [06:45:59] Starting job 2,CPU time has been restored to 1194.582656. [06:45:59] Starting new Job [06:45:59] Qink name = fldman [06:46:00] Qink name = gesman [06:46:00] Qink name = scfman [06:58:17] Qink name = anlman [06:58:17] Qink name = drvman [07:01:03] Qink name = optman [07:01:03] Qink name = fldman [07:01:03] Qink name = gesman [07:01:04] Qink name = scfman [07:21:36] Qink name = anlman [07:21:36] Qink name = drvman [07:24:19] Qink name = optman [07:24:19] Qink name = fldman [07:24:19] Qink name = gesman [07:24:21] Qink name = scfman [07:42:58] Qink name = anlman [07:42:58] Qink name = drvman [07:45:39] Qink name = optman [07:45:39] Qink name = fldman [07:45:39] Qink name = gesman [07:45:41] Qink name = scfman [08:03:43] Qink name = anlman [08:03:44] Qink name = drvman [08:06:25] Qink name = optman [08:06:25] Qink name = fldman [08:06:25] Qink name = gesman [08:06:26] Qink name = scfman [08:23:30] Qink name = anlman [08:23:30] Qink name = drvman [08:26:06] Qink name = optman [08:26:06] Qink name = fldman [08:26:06] Qink name = gesman [08:26:08] Qink name = scfman [08:42:49] Qink name = anlman [08:42:49] Qink name = drvman [08:45:25] Qink name = optman [08:45:25] Qink name = fldman [08:45:25] Qink name = gesman [08:45:27] Qink name = scfman [09:01:58] Qink name = anlman [09:01:58] Qink name = drvman [09:04:37] Qink name = optman [09:04:38] Qink name = fldman [09:04:38] Qink name = gesman [09:04:39] Qink name = scfman [09:21:06] Qink name = anlman [09:21:06] Qink name = drvman [09:23:43] Qink name = optman [09:23:44] Qink name = fldman [09:23:44] Qink name = gesman [09:23:45] Qink name = scfman [09:39:52] Qink name = anlman [09:39:52] Qink name = drvman [09:42:25] Qink name = optman [09:42:25] Qink name = fldman [09:42:25] Qink name = gesman [09:42:26] Qink name = scfman [09:58:30] Qink name = anlman [09:58:30] Qink name = drvman [10:01:03] Qink name = optman [10:01:03] Qink name = fldman [10:01:03] Qink name = gesman [10:01:04] Qink name = scfman [10:17:00] Qink name = anlman [10:17:00] Qink name = drvman [10:19:33] Qink name = optman [10:19:33] Qink name = fldman [10:19:33] Qink name = gesman [10:19:34] Qink name = scfman [10:33:09] Qink name = anlman [10:33:09] Qink name = drvman [10:35:45] Qink name = optman [10:35:46] Qink name = fldman [10:35:46] Qink name = gesman [10:35:47] Qink name = scfman [10:48:12] Qink name = anlman [10:48:12] Qink name = drvman [10:50:46] Qink name = optman [10:50:46] Qink name = fldman [10:50:46] Qink name = gesman [10:50:47] Qink name = scfman [11:01:52] Qink name = anlman [11:01:52] Qink name = drvman [11:04:27] Qink name = optman [11:04:27] Qink name = fldman [11:04:27] Qink name = gesman [11:04:28] Qink name = scfman [11:15:33] Qink name = anlman [11:15:33] Qink name = drvman [11:18:04] Qink name = optman [11:18:04] Qink name = fldman [11:18:04] Qink name = gesman [11:18:05] Qink name = scfman [11:29:07] Qink name = anlman [11:29:07] Qink name = drvman [11:31:41] Qink name = optman [11:31:41] Qink name = anlman [11:34:15] End of Job [11:34:17] Finished Job #2 [11:34:17] Starting job 3,CPU time has been restored to 17634.914112. [11:34:17] Starting new Job [11:34:17] Qink name = fldman [11:34:18] Qink name = gesman [11:34:18] Qink name = scfman [11:49:31] Qink name = anlman [11:52:05] End of Job [11:52:05] Finished Job #3 [11:52:05] Starting job 4,CPU time has been restored to 18652.281693. [11:52:05] Starting new Job [11:52:06] Qink name = fldman [11:52:07] Qink name = gesman [11:52:07] Qink name = scfman [12:04:14] Qink name = anlman [12:06:35] End of Job [12:06:36] Finished Job #4 [12:06:36] Starting job 5,CPU time has been restored to 19504.566957. [12:06:36] Starting new Job [12:06:37] Qink name = fldman [12:06:38] Qink name = gesman [12:06:38] Qink name = scfman [12:17:57] Qink name = anlman [12:20:25] End of Job [12:20:25] Finished Job #5 [12:20:25] Starting job 6,CPU time has been restored to 20311.721401. [12:20:26] Starting new Job [12:20:26] Qink name = fldman [12:20:27] Qink name = gesman [12:20:27] Qink name = scfman [12:31:06] Qink name = anlman [12:33:30] End of Job [12:33:31] Finished Job #6 [12:33:31] Starting job 7,CPU time has been restored to 21081.253493. [12:33:32] Starting new Job [12:33:32] Qink name = fldman [12:33:33] Qink name = gesman [12:33:33] Qink name = scfman [12:49:51] Qink name = anlman [12:52:21] End of Job [12:52:22] Finished Job #7 [12:52:22] Starting job 8,CPU time has been restored to 22184.202423. [12:52:22] Starting new Job [12:52:22] Qink name = fldman [12:52:23] Qink name = gesman [12:52:23] Qink name = scfman [13:02:48] Qink name = anlman [13:05:14] End of Job [13:05:15] Finished Job #8 [13:05:15] Starting job 9,CPU time has been restored to 22950.742328. [13:05:15] Starting new Job [13:05:15] Qink name = fldman [13:05:16] Qink name = gesman [13:05:17] Qink name = scfman [13:16:18] Qink name = anlman [13:19:47] End of Job [13:19:49] Finished Job #9 [13:19:49] Starting job 10,CPU time has been restored to 23815.036343. [13:19:49] Starting new Job [13:19:49] Qink name = fldman [13:19:50] Qink name = gesman [13:19:50] Qink name = scfman [13:47:36] Qink name = anlman [13:52:39] End of Job [13:52:40] Finished Job #10 [13:52:40] Starting job 11,CPU time has been restored to 25749.377231. [13:52:40] Starting new Job [13:52:40] Qink name = fldman [13:52:41] Qink name = gesman [13:52:41] Qink name = scfman [14:07:26] Qink name = anlman [14:11:48] End of Job [14:11:49] Finished Job #11 [14:11:49] Starting job 12,CPU time has been restored to 26869.067207. [14:11:49] Starting new Job [14:11:49] Qink name = fldman [14:11:55] Qink name = gesman [14:11:57] Qink name = scfman [16:17:39] Qink name = anlman Abort requested: Exiting </stderr_txt> ]]> Result Name: E216256_ 807_ I.40.C25F9H8N5O.00232392.4.set1d06_ 0-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [09:45:36] Number of jobs = 16 [09:45:36] Starting job 0,CPU time has been restored to 0.000000. [09:45:36] Starting new Job [09:45:37] Qink name = fldman [09:45:37] Qink name = gesman [09:45:37] Qink name = scfman [09:49:32] Qink name = anlman [09:49:36] End of Job [09:49:37] Finished Job #0 [09:49:37] Starting job 1,CPU time has been restored to 216.357521. [09:49:37] Starting new Job [09:49:37] Qink name = fldman [09:49:38] Qink name = gesman [09:49:38] Qink name = scfman [10:01:52] Qink name = anlman [10:03:54] End of Job [10:03:55] Finished Job #1 [10:03:55] Starting job 2,CPU time has been restored to 993.266074. [10:03:55] Starting new Job [10:03:55] Qink name = fldman [10:03:56] Qink name = gesman [10:03:57] Qink name = scfman [10:14:45] Qink name = anlman [10:14:45] Qink name = drvman [10:17:16] Qink name = optman [10:17:16] Qink name = fldman [10:17:16] Qink name = gesman [10:17:17] Qink name = scfman [10:35:41] Qink name = anlman [10:35:41] Qink name = drvman [10:38:15] Qink name = optman [10:38:15] Qink name = fldman [10:38:15] Qink name = gesman [10:38:16] Qink name = scfman [10:56:45] Qink name = anlman [10:56:45] Qink name = drvman [10:59:09] Qink name = optman [10:59:09] Qink name = fldman [10:59:09] Qink name = gesman [10:59:11] Qink name = scfman [11:16:51] Qink name = anlman [11:16:51] Qink name = drvman [11:19:17] Qink name = optman [11:19:18] Qink name = fldman [11:19:18] Qink name = gesman [11:19:19] Qink name = scfman [11:34:53] Qink name = anlman [11:34:53] Qink name = drvman [11:37:15] Qink name = optman [11:37:15] Qink name = fldman [11:37:15] Qink name = gesman [11:37:17] Qink name = scfman [11:51:54] Qink name = anlman [11:51:54] Qink name = drvman [11:54:19] Qink name = optman [11:54:19] Qink name = fldman [11:54:19] Qink name = gesman [11:54:20] Qink name = scfman [12:10:08] Qink name = anlman [12:10:08] Qink name = drvman [12:12:32] Qink name = optman [12:12:32] Qink name = fldman [12:12:32] Qink name = gesman [12:12:33] Qink name = scfman [12:27:12] Qink name = anlman [12:27:12] Qink name = drvman [12:29:28] Qink name = optman [12:29:28] Qink name = fldman [12:29:28] Qink name = gesman [12:29:29] Qink name = scfman [12:45:07] Qink name = anlman [12:45:07] Qink name = drvman [12:47:23] Qink name = optman [12:47:23] Qink name = fldman [12:47:23] Qink name = gesman [12:47:24] Qink name = scfman [13:02:06] Qink name = anlman [13:02:06] Qink name = drvman [13:04:22] Qink name = optman [13:04:22] Qink name = fldman [13:04:22] Qink name = gesman [13:04:23] Qink name = scfman [13:19:25] Qink name = anlman [13:19:25] Qink name = drvman [13:21:40] Qink name = optman [13:21:40] Qink name = fldman [13:21:40] Qink name = gesman [13:21:41] Qink name = scfman [13:36:22] Qink name = anlman [13:36:22] Qink name = drvman [13:38:28] Qink name = optman [13:38:28] Qink name = fldman [13:38:28] Qink name = gesman [13:38:30] Qink name = scfman [13:52:27] Qink name = anlman [13:52:27] Qink name = drvman [13:54:39] Qink name = optman [13:54:39] Qink name = fldman [13:54:39] Qink name = gesman [13:54:41] Qink name = scfman </stderr_txt> ]]>
[Edit 3 times, last edit by AgrFan at Oct 24, 2013 11:52:14 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sorry AgrFan, but your problem is -Unrelated-! Signal 11 is pointing at system overload [too much Disk IO generally or simple process time out which is exceeding 30 seconds]. If yours were the same as per topic, your log would print exactly what it says: "Maximum disk usage exceeded" **. The event/message log in your case also prints different information.
----------------------------------------At any rate, reducing the number of concurrent CEP2 is a working solution, or as what I have done set BOINC to pause when the [Linux] system load gets above 25%. Usually that only happens when I'm using the system, not when it's idle crunching. Running more than the default 1 at the time IS a case of trial and error. Factually my octo, on W7-64 is doing 8 concurrent, where Windows is much better at handling the large amount of IO that CEP2 generates. On my [old-ish] Q6600 Linux I never allow more than 1 as else the efficiency really drops through the floor, even when only doing idle crunching. Edit: ** The message is disk space use related, at individual task level, not intensity. [Edit 1 times, last edit by Former Member at Oct 24, 2013 11:58:42 AM] |
||
|
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges:
|
The 4th one is "Maximum disk usage exceeded"
----------------------------------------![]() ![]() Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It may be thinkable that the very big intermediate result file of one caused an aggravated assault domino effect? Many Linux reports have though added up to that platform showing poor performance/efficiency for this science... Just running default 1 at the time, default for a reason, is the established 'preventative' setting to not turn a client into a crash test dummy, particularly when 'used'.
|
||
|
|
AgrFan
Senior Cruncher USA Joined: Apr 17, 2008 Post Count: 397 Status: Offline Project Badges:
|
I haven't had any problems running 4 units concurrently on my two Linux boxes until the larger units arrived. I've successfully completed 8-12 units in the past 24 hours using the higher max disk usage setting. No errors since increasing the setting from 10GB to 20GB. Looking good so far ...
----------------------------------------
[Edit 1 times, last edit by AgrFan at Oct 25, 2013 11:51:17 AM] |
||
|
|
widdershins
Veteran Cruncher Scotland Joined: Apr 30, 2007 Post Count: 677 Status: Offline Project Badges:
|
I've been ruinning Linux with 8 CEP WU's at a time for quite a while now, (trying to get 5 years before it ends). Today I got my first Max usage message (that I know of).
The BOINC message was: Sat 26 Oct 2013 13:11:22 BST World Community Grid Aborting task E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4: exceeded disk limit: 2079.75MB > 2048.00MB Sat 26 Oct 2013 13:11:25 BST World Community Grid Computation for task E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4 finished Sat 26 Oct 2013 13:11:25 BST World Community Grid Output file E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4_0 for task E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4 absent Sat 26 Oct 2013 13:11:25 BST World Community Grid Output file E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4_1 for task E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4 absent Sat 26 Oct 2013 13:11:25 BST World Community Grid Output file E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4_2 for task E216475_342_I.42.C31F6H14N4O.00377081.0.set1d06_4 absent The unit error log was: Result Name: E216475_ 342_ I.42.C31F6H14N4O.00377081.0.set1d06_ 4-- <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> Maximum disk usage exceeded </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [02:19:40] Number of jobs = 16 [02:19:40] Starting job 0,CPU time has been restored to 0.000000. [02:19:40] Starting new Job [02:19:40] Qink name = fldman [02:19:40] Qink name = gesman [02:19:40] Qink name = scfman [02:24:32] Qink name = anlman [02:24:40] End of Job [02:24:40] Finished Job #0 [02:24:40] Starting job 1,CPU time has been restored to 269.350000. [02:24:40] Starting new Job [02:24:40] Qink name = fldman [02:24:42] Qink name = gesman [02:24:42] Qink name = scfman [02:38:38] Qink name = anlman [02:42:53] End of Job [02:42:56] Finished Job #1 [02:42:56] Starting job 2,CPU time has been restored to 1287.310000. [02:42:56] Starting new Job [02:42:56] Qink name = fldman [02:42:58] Qink name = gesman [02:42:58] Qink name = scfman [02:54:05] Qink name = anlman [02:54:05] Qink name = drvman [02:56:38] Qink name = optman [02:56:38] Qink name = fldman [02:56:38] Qink name = gesman [02:56:40] Qink name = scfman [03:14:29] Qink name = anlman [03:14:29] Qink name = drvman [03:16:57] Qink name = optman [03:16:57] Qink name = fldman [03:16:57] Qink name = gesman [03:16:59] Qink name = scfman [03:34:22] Qink name = anlman [03:34:22] Qink name = drvman [03:36:54] Qink name = optman [03:36:54] Qink name = fldman [03:36:54] Qink name = gesman [03:36:56] Qink name = scfman [03:53:40] Qink name = anlman [03:53:40] Qink name = drvman [03:56:06] Qink name = optman [03:56:06] Qink name = fldman [03:56:06] Qink name = gesman [03:56:08] Qink name = scfman [04:11:42] Qink name = anlman [04:11:42] Qink name = drvman [04:14:01] Qink name = optman [04:14:01] Qink name = fldman [04:14:01] Qink name = gesman [04:14:03] Qink name = scfman [04:29:27] Qink name = anlman [04:29:27] Qink name = drvman [04:31:54] Qink name = optman [04:31:54] Qink name = fldman [04:31:54] Qink name = gesman [04:31:56] Qink name = scfman [04:47:45] Qink name = anlman [04:47:45] Qink name = drvman [04:50:08] Qink name = optman [04:50:08] Qink name = fldman [04:50:08] Qink name = gesman [04:50:10] Qink name = scfman [05:05:13] Qink name = anlman [05:05:13] Qink name = drvman [05:07:35] Qink name = optman [05:07:35] Qink name = fldman [05:07:35] Qink name = gesman [05:07:37] Qink name = scfman [05:22:50] Qink name = anlman [05:22:50] Qink name = drvman [05:25:13] Qink name = optman [05:25:13] Qink name = fldman [05:25:13] Qink name = gesman [05:25:15] Qink name = scfman [05:40:11] Qink name = anlman [05:40:11] Qink name = drvman [05:42:29] Qink name = optman [05:42:29] Qink name = fldman [05:42:29] Qink name = gesman [05:42:31] Qink name = scfman [05:57:33] Qink name = anlman [05:57:33] Qink name = drvman [05:59:59] Qink name = optman [06:00:00] Qink name = fldman [06:00:00] Qink name = gesman [06:00:01] Qink name = scfman [06:15:14] Qink name = anlman [06:15:14] Qink name = drvman [06:17:34] Qink name = optman [06:17:34] Qink name = fldman [06:17:34] Qink name = gesman [06:17:36] Qink name = scfman [06:33:47] Qink name = anlman [06:33:47] Qink name = drvman [06:36:02] Qink name = optman [06:36:03] Qink name = fldman [06:36:03] Qink name = gesman [06:36:04] Qink name = scfman [06:49:56] Qink name = anlman [06:49:56] Qink name = drvman [06:52:16] Qink name = optman [06:52:17] Qink name = fldman [06:52:17] Qink name = gesman [06:52:18] Qink name = scfman [07:04:49] Qink name = anlman [07:04:49] Qink name = drvman [07:07:21] Qink name = optman [07:07:21] Qink name = fldman [07:07:21] Qink name = gesman [07:07:23] Qink name = scfman [07:19:52] Qink name = anlman [07:19:52] Qink name = drvman [07:22:08] Qink name = optman [07:22:08] Qink name = fldman [07:22:08] Qink name = gesman [07:22:10] Qink name = scfman [07:33:53] Qink name = anlman [07:33:54] Qink name = drvman [07:36:10] Qink name = optman [07:36:10] Qink name = fldman [07:36:10] Qink name = gesman [07:36:12] Qink name = scfman [07:46:38] Qink name = anlman [07:46:38] Qink name = drvman [07:48:52] Qink name = optman [07:48:52] Qink name = anlman [07:52:53] End of Job [07:52:54] Finished Job #2 [07:52:54] Starting job 3,CPU time has been restored to 18242.430000. [07:52:55] Starting new Job [07:52:55] Qink name = fldman [07:52:56] Qink name = gesman [07:52:56] Qink name = scfman [08:08:23] Qink name = anlman [08:12:26] End of Job [08:12:28] Finished Job #3 [08:12:28] Starting job 4,CPU time has been restored to 19325.380000. [08:12:28] Starting new Job [08:12:28] Qink name = fldman [08:12:30] Qink name = gesman [08:12:30] Qink name = scfman [08:26:07] Qink name = anlman [08:29:48] End of Job [08:29:50] Finished Job #4 [08:29:50] Starting job 5,CPU time has been restored to 20317.040000. [08:29:50] Starting new Job [08:29:50] Qink name = fldman [08:29:52] Qink name = gesman [08:29:52] Qink name = scfman [08:43:02] Qink name = anlman [08:46:35] End of Job [08:46:36] Finished Job #5 [08:46:36] Starting job 6,CPU time has been restored to 21272.860000. [08:46:36] Starting new Job [08:46:37] Qink name = fldman [08:46:38] Qink name = gesman [08:46:38] Qink name = scfman [08:59:10] Qink name = anlman [09:02:52] End of Job [09:02:52] Finished Job #6 [09:02:52] Starting job 7,CPU time has been restored to 22205.180000. [09:02:53] Starting new Job [09:02:53] Qink name = fldman [09:02:54] Qink name = gesman [09:02:54] Qink name = scfman [09:21:15] Qink name = anlman [09:24:56] End of Job [09:24:56] Finished Job #7 [09:24:56] Starting job 8,CPU time has been restored to 23466.740000. [09:24:57] Starting new Job [09:24:57] Qink name = fldman [09:24:58] Qink name = gesman [09:24:58] Qink name = scfman [09:37:27] Qink name = anlman [09:41:14] End of Job [09:41:15] Finished Job #8 [09:41:15] Starting job 9,CPU time has been restored to 24408.380000. [09:41:15] Starting new Job [09:41:15] Qink name = fldman [09:41:17] Qink name = gesman [09:41:17] Qink name = scfman [09:54:03] Qink name = anlman [10:00:02] End of Job [10:00:03] Finished Job #9 [10:00:03] Starting job 10,CPU time has been restored to 25496.800000. [10:00:03] Starting new Job [10:00:03] Qink name = fldman [10:00:05] Qink name = gesman [10:00:05] Qink name = scfman [10:31:19] Qink name = anlman [10:37:31] End of Job [10:37:32] Finished Job #10 [10:37:32] Starting job 11,CPU time has been restored to 27653.840000. [10:37:32] Starting new Job [10:37:32] Qink name = fldman [10:37:34] Qink name = gesman [10:37:34] Qink name = scfman [10:54:20] Qink name = anlman [11:00:41] End of Job [11:00:42] Finished Job #11 [11:00:42] Starting job 12,CPU time has been restored to 28975.120000. [11:00:42] Starting new Job [11:00:42] Qink name = fldman [11:00:52] Qink name = gesman [11:00:52] Qink name = scfman [12:42:45] Qink name = anlman Abort requested: Exiting </stderr_txt> ]]> I'm wondering if this is a limit set into the code either to suit the WCG or scientists systems, or a legacy limitation buried in the code somewhere. That errant unit was running on a 64bit machine, with an EXT4 filesystem, with a 64bit os, and a 64bit BOINC Client so a 2Gb filesise limit shouldn't be an issue from my end. It shouldn't even be memory related, because there is over 12Gb still free with 8 CEP WU's running at once. I'd also appreciate it if the techs could see fit to grant credit for these units as it appears my wingmen are erroring out with the same message also and they aren't small units to lose. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
So happens that I just did a clean Ubu 13.10 install [separate partition so to be able to boot into 13.04 and fetch/recoup my old BOINC install]. Chose for the new saucy EXT2 as filesystem, so as to avoid the journalling load that ext3/ext4 incur. Looked 2 days ago and on my octo with 8 concurrent CEP2 there were 54,000 open files in the slots. Now I'm testing one CEP2 to see if under EXT2 on a IDE SATA spinner am getting a better efficiency than the usual 93-95% [when not using system].
----------------------------------------Yes, the application limits the maximum disk use for a task and then bombs. Maybe these will be re-run on the Harvard cluster or are resubmitted when the CEP2v2 comes out [make that a 2014 hope], which is meant to bring ''giant'' models to the grid... an opt-in to an opt-in [hoping for that too is done with foresight]. The odd thing is, in a few instances the wingman and resubmission managed to finish the task without the issue, which makes you wonder about the reproducability of a simulation... why one hits this issue and not the next on the same task? edit: clarify a line. edit2: Think to remember that techs/cleanenergy said they'd not increase the max-disk use for a task. [Edit 2 times, last edit by Former Member at Oct 26, 2013 1:47:00 PM] |
||
|
|
widdershins
Veteran Cruncher Scotland Joined: Apr 30, 2007 Post Count: 677 Status: Offline Project Badges:
|
I found this via google. It's from a document detailing how to create BOINC projects.
----------------------------------------Page 5 File properties Files have various properties, including: Name: unique identifier for the file. Sticky: don't delete file on client (see below). Report on RPC: include a description of this file in scheduler requests. Maximum size : if an output file exceeds its maximum size, the computation is aborted. File properties are specified in XML elements that appear, for example, in workunit templates So it looks like the ball is in the techs court, either make the WU's smaller or the filesize larger. [Edit 1 times, last edit by widdershins at Oct 26, 2013 1:13:38 PM] |
||
|
|
|