| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 29
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
/me just wondering whether 2 errors of that kind in a row since I'm running the new linux kernel are related to it or not.
Anybody made similar experiences ? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If you visit the Result Status page and click on the error link, then post the log, we could tell you something more. CEP2 does not like excessively busy systems, particularly on Linux.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This is, what I got :
World Community Grid Result Log Result Name: E209820_ 227_ C.29.C26H14SSe2.01903845.4.set1d06_ 0-- <core_client_version>7.0.35</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [13:18:05] Number of jobs = 16 [13:18:05] Starting job 0,CPU time has been restored to 0.000000. [13:18:05] Starting new Job [13:18:07] Qink name = fldman [13:18:13] Qink name = gesman [13:18:13] Qink name = scfman [13:25:02] Qink name = anlman [13:25:12] End of Job [13:25:13] Finished Job #0 [13:25:13] Starting job 1,CPU time has been restored to 361.102104. [13:25:13] Starting new Job [13:25:13] Qink name = fldman [13:25:16] Qink name = gesman [13:25:18] Qink name = scfman [13:53:55] Qink name = anlman [13:56:32] End of Job [13:56:33] Finished Job #1 [13:56:33] Starting job 2,CPU time has been restored to 2071.366104. [13:56:34] Starting new Job [13:56:34] Qink name = fldman [13:56:36] Qink name = gesman [13:56:37] Qink name = scfman [15:58:32] Qink name = anlman [15:58:32] Qink name = drvman [16:03:06] Qink name = optman [16:03:06] Qink name = fldman [16:03:06] Qink name = gesman [16:03:08] Qink name = scfman [16:33:07] Qink name = anlman [16:33:07] Qink name = drvman [16:38:16] Qink name = optman [16:38:16] Qink name = fldman [16:38:16] Qink name = gesman [16:38:23] Qink name = scfman [17:12:21] Qink name = anlman [17:12:21] Qink name = drvman [17:16:45] Qink name = optman [17:16:46] Qink name = fldman [17:16:46] Qink name = gesman [17:16:56] Qink name = scfman [17:49:44] Qink name = anlman [17:49:44] Qink name = drvman [17:55:41] Qink name = optman [17:55:41] Qink name = fldman [17:55:41] Qink name = gesman [17:55:45] Qink name = scfman [18:25:51] Qink name = anlman [18:25:51] Qink name = drvman [18:28:42] Qink name = optman [18:28:42] Qink name = fldman [18:28:42] Qink name = gesman [18:28:48] Qink name = scfman Quit requested: Exiting [18:34:20] Number of jobs = 16 [18:34:20] Starting job 2,CPU time has been restored to 2071.366104. [18:34:29] Starting new Job [18:34:30] Qink name = fldman [18:34:33] Qink name = gesman [18:34:33] Qink name = scfman Quit requested: Exiting [18:39:34] Number of jobs = 16 [18:39:35] Starting job 2,CPU time has been restored to 2071.366104. [18:39:51] Starting new Job [18:39:52] Qink name = fldman [18:39:59] Qink name = gesman [18:39:59] Qink name = scfman [18:58:27] Qink name = anlman [18:58:27] Qink name = drvman [19:04:00] Qink name = optman [19:04:00] Qink name = fldman [19:04:00] Qink name = gesman [19:04:03] Qink name = scfman [19:37:57] Qink name = anlman [19:37:57] Qink name = drvman [19:43:48] Qink name = optman [19:43:48] Qink name = fldman [19:43:48] Qink name = gesman [19:43:51] Qink name = scfman [20:12:52] Qink name = anlman [20:12:52] Qink name = drvman [20:18:43] Qink name = optman [20:18:43] Qink name = fldman [20:18:43] Qink name = gesman [20:18:48] Qink name = scfman [20:52:49] Qink name = anlman [20:52:49] Qink name = drvman [20:58:36] Qink name = optman [20:58:36] Qink name = fldman [20:58:36] Qink name = gesman [20:58:40] Qink name = scfman [21:29:38] Qink name = anlman [21:29:38] Qink name = drvman [21:35:28] Qink name = optman [21:35:28] Qink name = fldman [21:35:28] Qink name = gesman [21:35:32] Qink name = scfman [22:07:26] Qink name = anlman [22:07:26] Qink name = drvman [22:13:13] Qink name = optman [22:13:13] Qink name = fldman [22:13:13] Qink name = gesman [22:13:17] Qink name = scfman [22:44:29] Qink name = anlman [22:44:29] Qink name = drvman [22:48:56] Qink name = optman [22:49:00] Qink name = fldman [22:49:00] Qink name = gesman [22:49:05] Qink name = scfman [23:20:08] Qink name = anlman [23:20:09] Qink name = drvman [23:24:36] Qink name = optman [23:24:36] Qink name = fldman [23:24:36] Qink name = gesman [23:24:39] Qink name = scfman [19:01:03] Qink name = anlman [19:01:03] Qink name = drvman [19:07:19] Qink name = optman [19:07:21] Qink name = fldman [19:07:21] Qink name = gesman [19:07:24] Qink name = scfman [19:29:06] Qink name = anlman [19:29:06] Qink name = drvman [19:41:43] Qink name = optman [19:41:43] Qink name = fldman [19:41:43] Qink name = gesman [19:41:49] Qink name = scfman [20:15:21] Qink name = anlman [20:15:21] Qink name = drvman [20:19:56] Qink name = optman [20:19:59] Qink name = fldman [20:19:59] Qink name = gesman [20:20:06] Qink name = scfman [20:53:16] Qink name = anlman [20:53:16] Qink name = drvman [20:58:51] Qink name = optman [20:58:52] Qink name = fldman [20:58:52] Qink name = gesman [20:58:56] Qink name = scfman [21:22:33] Qink name = anlman [21:22:34] Qink name = drvman [21:28:21] Qink name = optman [21:28:21] Qink name = anlman [21:30:30] End of Job [21:30:31] Finished Job #2 [21:30:31] Starting job 3,CPU time has been restored to 23635.675830. Quit requested: Exiting [17:53:33] Number of jobs = 16 [17:53:33] Starting job 3,CPU time has been restored to 23635.675830. [17:53:47] Starting new Job [17:53:48] Qink name = fldman [17:53:55] Qink name = gesman [17:53:57] Qink name = scfman [18:21:19] Qink name = anlman [18:23:25] End of Job [18:23:28] Finished Job #3 [18:23:28] Starting job 4,CPU time has been restored to 24753.968823. [18:23:29] Starting new Job [18:23:29] Qink name = fldman [18:23:34] Qink name = gesman [18:23:35] Qink name = scfman [18:43:26] Qink name = anlman [18:45:33] End of Job [18:45:35] Finished Job #4 [18:45:35] Starting job 5,CPU time has been restored to 25941.593276. [18:45:35] Starting new Job [18:45:35] Qink name = fldman [18:45:37] Qink name = gesman [18:45:40] Qink name = scfman [19:11:36] Qink name = anlman [19:13:01] End of Job [19:13:02] Finished Job #5 [19:13:02] Starting job 6,CPU time has been restored to 27504.137733. [19:13:02] Starting new Job [19:13:02] Qink name = fldman [19:13:04] Qink name = gesman [19:13:06] Qink name = scfman [19:31:20] Qink name = anlman [19:33:29] End of Job [19:33:31] Finished Job #6 [19:33:31] Starting job 7,CPU time has been restored to 28576.243748. [19:33:31] Starting new Job [19:33:31] Qink name = fldman [19:33:34] Qink name = gesman [19:33:35] Qink name = scfman [20:07:25] Qink name = anlman [20:09:34] End of Job [20:09:36] Finished Job #7 [20:09:36] Starting job 8,CPU time has been restored to 30555.443863. [20:09:36] Starting new Job [20:09:36] Qink name = fldman [20:09:38] Qink name = gesman [20:09:39] Qink name = scfman [20:32:40] Qink name = anlman [20:34:16] End of Job [20:34:17] Finished Job #8 [20:34:17] Starting job 9,CPU time has been restored to 31985.825411. [20:34:17] Starting new Job [20:34:17] Qink name = fldman [20:34:20] Qink name = gesman [20:34:20] Qink name = scfman [20:54:54] Qink name = anlman [20:58:36] End of Job [20:58:37] Finished Job #9 [20:58:37] Starting job 10,CPU time has been restored to 33378.161743. [20:58:38] Starting new Job [20:58:38] Qink name = fldman [20:58:41] Qink name = gesman [20:58:42] Qink name = scfman Quit requested: Exiting [18:02:50] Number of jobs = 16 [18:02:50] Starting job 10,CPU time has been restored to 33378.161743. [18:03:18] Starting new Job [18:03:19] Qink name = fldman Parent was killed, exiting </stderr_txt> ]]> |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Signal 11 and parent was killed, probably when job #10 was underway. are the signs of a too busy system, not specific to any Linux kernel. I've got BOINC set [on Ubuntu 11.10] to pause when the non-BOINC system load is greater than 35%. (also with "Leave Application in Memory when suspended). This stops BOINC for 10 second segments long as the system wants attention. Better to have CEP2 pause a few minutes, than crash out or revert to last checkpoint that can be hours computing time back.
|
||
|
|
B2I
Senior Cruncher usa Joined: Jan 23, 2011 Post Count: 232 Status: Offline Project Badges:
|
I'm getting the same thing on My Linux machine it's running Mint 13 LTS with the latest kernal updated yesterday.
----------------------------------------Result Name: E209864_ 162_ C.32.C27H13NO3S.01838376.1.set1d06_ 0-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [08:06:53] Number of jobs = 16 [08:06:53] Starting job 0,CPU time has been restored to 0.000000. [08:06:53] Starting new Job [08:06:53] Qink name = fldman [08:06:54] Qink name = gesman [08:06:54] Qink name = scfman [08:08:32] Qink name = anlman [08:08:33] End of Job [08:08:34] Finished Job #0 [08:08:34] Starting job 1,CPU time has been restored to 91.425713. [08:08:34] Starting new Job [08:08:34] Qink name = fldman [08:08:35] Qink name = gesman [08:08:35] Qink name = scfman [08:14:36] Qink name = anlman [08:15:14] End of Job [08:15:15] Finished Job #1 [08:15:15] Starting job 2,CPU time has been restored to 458.508654. [08:15:16] Starting new Job [08:15:16] Qink name = fldman [08:15:18] Qink name = gesman [08:15:18] Qink name = scfman [08:20:14] Qink name = anlman [08:20:14] Qink name = drvman [08:21:28] Qink name = optman [08:21:28] Qink name = fldman [08:21:28] Qink name = gesman [08:21:29] Qink name = scfman [08:29:43] Qink name = anlman [08:29:43] Qink name = drvman [08:30:55] Qink name = optman [08:30:55] Qink name = fldman [08:30:55] Qink name = gesman [08:30:56] Qink name = scfman [08:38:42] Qink name = anlman [08:38:42] Qink name = drvman [08:39:57] Qink name = optman [08:39:57] Qink name = fldman [08:39:57] Qink name = gesman [08:39:57] Qink name = scfman [08:47:48] Qink name = anlman [08:47:48] Qink name = drvman [08:49:03] Qink name = optman [08:49:04] Qink name = fldman [08:49:04] Qink name = gesman [08:49:04] Qink name = scfman [08:56:44] Qink name = anlman [08:56:44] Qink name = drvman [08:57:53] Qink name = optman [08:57:53] Qink name = fldman [08:57:53] Qink name = gesman [08:57:54] Qink name = scfman [09:05:32] Qink name = anlman [09:05:32] Qink name = drvman [09:06:43] Qink name = optman [09:06:43] Qink name = fldman [09:06:43] Qink name = gesman [09:06:44] Qink name = scfman [09:14:23] Qink name = anlman [09:14:23] Qink name = drvman [09:15:35] Qink name = optman [09:15:35] Qink name = fldman [09:15:35] Qink name = gesman [09:15:36] Qink name = scfman [09:23:19] Qink name = anlman [09:23:19] Qink name = drvman [09:24:28] Qink name = optman [09:24:28] Qink name = fldman [09:24:28] Qink name = gesman [09:24:29] Qink name = scfman [09:31:55] Qink name = anlman [09:31:55] Qink name = drvman [09:33:08] Qink name = optman [09:33:08] Qink name = fldman [09:33:08] Qink name = gesman [09:33:09] Qink name = scfman [09:40:12] Qink name = anlman [09:40:12] Qink name = drvman [09:41:26] Qink name = optman [09:41:26] Qink name = fldman [09:41:26] Qink name = gesman [09:41:27] Qink name = scfman [09:47:11] Qink name = anlman [09:47:11] Qink name = drvman [09:48:22] Qink name = optman [09:48:22] Qink name = fldman [09:48:22] Qink name = gesman [09:48:23] Qink name = scfman [09:54:21] Qink name = anlman [09:54:21] Qink name = drvman [09:55:32] Qink name = optman [09:55:32] Qink name = fldman [09:55:32] Qink name = gesman [09:55:33] Qink name = scfman [10:01:50] Qink name = anlman [10:01:50] Qink name = drvman [10:03:01] Qink name = optman [10:03:01] Qink name = fldman [10:03:01] Qink name = gesman [10:03:01] Qink name = scfman [10:08:53] Qink name = anlman [10:08:53] Qink name = drvman [10:10:04] Qink name = optman [10:10:05] Qink name = fldman [10:10:05] Qink name = gesman [10:10:05] Qink name = scfman [10:15:47] Qink name = anlman [10:15:47] Qink name = drvman [10:16:59] Qink name = optman [10:16:59] Qink name = fldman [10:16:59] Qink name = gesman [10:16:59] Qink name = scfman [10:22:13] Qink name = anlman [10:22:13] Qink name = drvman [10:23:24] Qink name = optman [10:23:24] Qink name = anlman [10:24:05] End of Job [10:24:06] Finished Job #2 [10:24:06] Starting job 3,CPU time has been restored to 7309.464812. [10:24:06] Starting new Job [10:24:06] Qink name = fldman [10:24:07] Qink name = gesman [10:24:07] Qink name = scfman [10:30:20] Qink name = anlman [10:31:03] End of Job [10:31:04] Finished Job #3 [10:31:04] Starting job 4,CPU time has been restored to 7690.196606. [10:31:04] Starting new Job [10:31:05] Qink name = fldman [10:31:05] Qink name = gesman [10:31:05] Qink name = scfman [10:36:25] Qink name = anlman [10:37:06] End of Job [10:37:07] Finished Job #4 [10:37:07] Starting job 5,CPU time has been restored to 8039.822456. [10:37:07] Starting new Job [10:37:08] Qink name = fldman [10:37:08] Qink name = gesman [10:37:08] Qink name = scfman [10:42:43] Qink name = anlman [10:43:28] End of Job [10:43:29] Finished Job #5 [10:43:29] Starting job 6,CPU time has been restored to 8404.053219. [10:43:29] Starting new Job [10:43:29] Qink name = fldman [10:43:30] Qink name = gesman [10:43:30] Qink name = scfman [10:48:46] Qink name = anlman [10:49:27] End of Job [10:49:27] Finished Job #6 [10:49:27] Starting job 7,CPU time has been restored to 8750.382863. [10:49:27] Starting new Job [10:49:27] Qink name = fldman [10:49:28] Qink name = gesman [10:49:28] Qink name = scfman [10:57:13] Qink name = anlman [10:58:30] End of Job [10:58:31] Finished Job #7 [10:58:31] Starting job 8,CPU time has been restored to 9271.935458. [10:58:31] Starting new Job [10:58:31] Qink name = fldman [10:58:31] Qink name = gesman [10:58:31] Qink name = scfman [11:03:48] Qink name = anlman [11:04:27] End of Job [11:04:31] Finished Job #8 [11:04:31] Starting job 9,CPU time has been restored to 9623.405423. [11:04:31] Starting new Job [11:04:31] Qink name = fldman [11:04:31] Qink name = gesman [11:04:31] Qink name = scfman [11:10:01] Qink name = anlman [11:11:02] End of Job [11:11:04] Finished Job #9 [11:11:04] Starting job 10,CPU time has been restored to 10010.253599. [11:11:04] Starting new Job [11:11:04] Qink name = fldman [11:11:05] Qink name = gesman [11:11:05] Qink name = scfman Parent was killed, exiting </stderr_txt> ]]> I'm also getting errors on some of my Windows machines. I've been running straight CEP O2 for about 6 weeks. Just started getting the errors 3-4 days ago. ![]() |
||
|
|
B2I
Senior Cruncher usa Joined: Jan 23, 2011 Post Count: 232 Status: Offline Project Badges:
|
I also have another anomaly. this one probably does not have anything to do with WCG or BOINC Just let me know if there could be any connection with this issue. A two year old I7 950 quad core with a mild overclock (3.9 GHz) that normally runs a CEP p2 in 9-11 hours, is now taking 38-45 hrs per work unit. It's got plenty of memory (24 Gb) and hdd (7200 sata II 1TB. My other Sandybridges are doing fine.
----------------------------------------B2I ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7849 Status: Offline Project Badges:
|
I also have another anomaly. this one probably does not have anything to do with WCG or BOINC Just let me know if there could be any connection with this issue. A two year old I7 950 quad core with a mild overclock (3.9 GHz) that normally runs a CEP p2 in 9-11 hours, is now taking 38-45 hrs per work unit. It's got plenty of memory (24 Gb) and hdd (7200 sata II 1TB. My other Sandybridges are doing fine. B2I Is that clock time or cpu time ?
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
B2I
Senior Cruncher usa Joined: Jan 23, 2011 Post Count: 232 Status: Offline Project Badges:
|
Good question. I was simply looking at the "to Completion" colume. When I went to the results page and check several of the Valids and Pendings, the time were in the normal or better than normal range of 6-9 hours. I'll put a clock on a unit and determint the actual clock time.
----------------------------------------I reset the project last night but the results were the same. Thanks for the help ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, the system isn't too busy, but booted from an external USB drive - therefore - yes - sometimes it might take longer than usual to access certain data segments.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've got Ubuntu 12.04 on a USB 3.0 Sandisk stick and consistently saw HCMD2 and CEP2 fail. All other sciences did fine. BOINC was set with an enforced start delay [can be set in the cc_config.xml with the <start_delay>xxx</start_delay> options line, value in seconds]. Maybe try that as a possible mitigation.
|
||
|
|
|