Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: 16 Tasks, 15 Complete, 1 Error, No Credit, Start Again, what a waste of resources. |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 7
|
Author |
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
I have run several of these WU's only to have then Error some way into their run. As these WU's comprise 16 tasks, why is credit not awarded for partially completed work. I dont see why we would not be credited for completing 15 tasks from the 16, especially if there is a time limit for some which would mean they never even try to complete the last tasks.
- It's like making cars in a manufacturing plant, but at the end of the working day the cars in progress just get scrapped. Result Name: E200465_ 273_ A.25.C21H13NO2S.70.2.set1d06_ 0-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [19:50:12] Number of jobs = 16 [19:50:12] Starting job 0,CPU time has been restored to 0.000000. [08:40:51] Starting new Job [08:40:51] Qink name = fldman [08:40:51] Qink name = gesman [08:40:51] Qink name = scfman [08:42:38] Qink name = anlman [08:42:39] End of Job [08:42:41] Finished Job #0 [08:42:41] Starting job 1,CPU time has been restored to 96.580000. [08:42:41] Starting new Job [08:42:41] Qink name = fldman [08:42:42] Qink name = gesman [08:42:42] Qink name = scfman [08:47:01] Qink name = anlman [08:47:33] End of Job [08:47:36] Finished Job #1 [08:47:36] Starting job 2,CPU time has been restored to 370.710000. [08:47:36] Starting new Job [08:47:36] Qink name = fldman [08:47:37] Qink name = gesman [08:47:37] Qink name = scfman [08:51:25] Qink name = anlman [08:51:25] Qink name = drvman [08:52:21] Qink name = optman [08:52:21] Qink name = fldman [08:52:21] Qink name = gesman [08:52:22] Qink name = scfman [08:58:46] Qink name = anlman [08:58:46] Qink name = drvman [08:59:40] Qink name = optman [08:59:41] Qink name = fldman [08:59:41] Qink name = gesman [08:59:41] Qink name = scfman [09:06:14] Qink name = anlman [09:06:14] Qink name = drvman [09:07:09] Qink name = optman [09:07:09] Qink name = fldman [09:07:09] Qink name = gesman [09:07:10] Qink name = scfman [09:13:44] Qink name = anlman [09:13:44] Qink name = drvman [09:14:40] Qink name = optman [09:14:40] Qink name = fldman [09:14:40] Qink name = gesman [09:14:41] Qink name = scfman [09:20:58] Qink name = anlman [09:20:58] Qink name = drvman [09:21:52] Qink name = optman [09:21:52] Qink name = fldman [09:21:52] Qink name = gesman [09:21:52] Qink name = scfman [09:28:08] Qink name = anlman [09:28:08] Qink name = drvman [09:29:06] Qink name = optman [09:29:06] Qink name = fldman [09:29:06] Qink name = gesman [09:29:06] Qink name = scfman [09:35:18] Qink name = anlman [09:35:18] Qink name = drvman [09:36:12] Qink name = optman [09:36:12] Qink name = fldman [09:36:12] Qink name = gesman [09:36:13] Qink name = scfman [09:42:18] Qink name = anlman [09:42:18] Qink name = drvman [09:43:13] Qink name = optman [09:43:13] Qink name = fldman [09:43:13] Qink name = gesman [09:43:13] Qink name = scfman [09:49:17] Qink name = anlman [09:49:17] Qink name = drvman [09:50:13] Qink name = optman [09:50:13] Qink name = fldman [09:50:13] Qink name = gesman [09:50:13] Qink name = scfman [09:56:18] Qink name = anlman [09:56:18] Qink name = drvman [09:57:13] Qink name = optman [09:57:13] Qink name = fldman [09:57:13] Qink name = gesman [09:57:14] Qink name = scfman [10:03:39] Qink name = anlman [10:03:39] Qink name = drvman [10:04:34] Qink name = optman [10:04:34] Qink name = fldman [10:04:34] Qink name = gesman [10:04:35] Qink name = scfman [10:10:36] Qink name = anlman [10:10:36] Qink name = drvman [10:11:32] Qink name = optman [10:11:32] Qink name = fldman [10:11:32] Qink name = gesman [10:11:32] Qink name = scfman [10:17:34] Qink name = anlman [10:17:34] Qink name = drvman [10:18:29] Qink name = optman [10:18:29] Qink name = fldman [10:18:29] Qink name = gesman [10:18:30] Qink name = scfman [10:24:28] Qink name = anlman [10:24:28] Qink name = drvman [10:25:23] Qink name = optman [10:25:23] Qink name = fldman [10:25:23] Qink name = gesman [10:25:23] Qink name = scfman [10:30:21] Qink name = anlman [10:30:21] Qink name = drvman [10:31:16] Qink name = optman [10:31:16] Qink name = fldman [10:31:16] Qink name = gesman [10:31:16] Qink name = scfman [10:36:19] Qink name = anlman [10:36:19] Qink name = drvman [10:37:14] Qink name = optman [10:37:14] Qink name = fldman [10:37:14] Qink name = gesman [10:37:14] Qink name = scfman [10:41:53] Qink name = anlman [10:41:53] Qink name = drvman [10:42:48] Qink name = optman [10:42:48] Qink name = fldman [10:42:48] Qink name = gesman [10:42:49] Qink name = scfman [10:47:49] Qink name = anlman [10:47:49] Qink name = drvman [10:48:43] Qink name = optman [10:48:43] Qink name = fldman [10:48:43] Qink name = gesman [10:48:44] Qink name = scfman [10:53:41] Qink name = anlman [10:53:41] Qink name = drvman [10:54:36] Qink name = optman [10:54:36] Qink name = fldman [10:54:36] Qink name = gesman [10:54:37] Qink name = scfman [10:58:46] Qink name = anlman [10:58:46] Qink name = drvman [10:59:42] Qink name = optman [10:59:42] Qink name = fldman [10:59:42] Qink name = gesman [10:59:43] Qink name = scfman [11:03:46] Qink name = anlman [11:03:46] Qink name = drvman [11:04:41] Qink name = optman [11:04:41] Qink name = anlman [11:05:14] End of Job [11:05:17] Finished Job #2 [11:05:17] Starting job 3,CPU time has been restored to 8001.800000. [11:05:17] Starting new Job [11:05:17] Qink name = fldman [11:05:18] Qink name = gesman [11:05:18] Qink name = scfman [11:10:31] Qink name = anlman [11:11:03] End of Job [11:11:06] Finished Job #3 [11:11:06] Starting job 4,CPU time has been restored to 8323.360000. [11:11:06] Starting new Job [11:11:06] Qink name = fldman [11:11:07] Qink name = gesman [11:11:07] Qink name = scfman [11:15:54] Qink name = anlman [11:16:27] End of Job [11:16:30] Finished Job #4 [11:16:30] Starting job 5,CPU time has been restored to 8593.710000. [11:16:30] Starting new Job [11:16:30] Qink name = fldman [11:16:31] Qink name = gesman [11:16:31] Qink name = scfman [11:20:54] Qink name = anlman [11:21:25] End of Job [11:21:28] Finished Job #5 [11:21:28] Starting job 6,CPU time has been restored to 8872.870000. [11:21:28] Starting new Job [11:21:28] Qink name = fldman [11:21:28] Qink name = gesman [11:21:28] Qink name = scfman [11:25:40] Qink name = anlman [11:26:12] End of Job [11:26:15] Finished Job #6 [11:26:15] Starting job 7,CPU time has been restored to 9141.820000. [11:26:15] Starting new Job [11:26:15] Qink name = fldman [11:26:16] Qink name = gesman [11:26:16] Qink name = scfman [11:32:13] Qink name = anlman [11:32:45] End of Job [11:32:47] Finished Job #7 [11:32:47] Starting job 8,CPU time has been restored to 9514.080000. [11:32:48] Starting new Job [11:32:48] Qink name = fldman [11:32:48] Qink name = gesman [11:32:48] Qink name = scfman [11:36:54] Qink name = anlman [11:37:27] End of Job [11:37:30] Finished Job #8 [11:37:30] Starting job 9,CPU time has been restored to 9783.060000. [11:37:30] Starting new Job [11:37:30] Qink name = fldman [11:37:31] Qink name = gesman [11:37:31] Qink name = scfman [11:41:49] Qink name = anlman [11:42:33] End of Job [11:42:36] Finished Job #9 [11:42:36] Starting job 10,CPU time has been restored to 10069.610000. [11:42:36] Starting new Job [11:42:36] Qink name = fldman [11:42:37] Qink name = gesman [11:42:37] Qink name = scfman [11:53:00] Qink name = anlman [11:53:42] End of Job [11:53:45] Finished Job #10 [11:53:45] Starting job 11,CPU time has been restored to 10703.590000. [11:53:45] Starting new Job [11:53:46] Qink name = fldman [11:53:46] Qink name = gesman [11:53:46] Qink name = scfman [11:59:29] Qink name = anlman [12:00:10] End of Job [12:00:13] Finished Job #11 [12:00:13] Starting job 12,CPU time has been restored to 11062.520000. [12:00:13] Starting new Job [12:00:13] Qink name = fldman [12:00:16] Qink name = gesman [12:00:16] Qink name = scfman [12:25:58] Qink name = anlman [12:32:48] End of Job [12:32:52] Finished Job #12 [12:32:52] Starting job 13,CPU time has been restored to 12943.200000. [12:32:52] Starting new Job [12:32:52] Qink name = fldman [12:32:55] Qink name = gesman [12:32:55] Qink name = scfman [13:52:40] Qink name = anlman [13:59:23] End of Job [13:59:26] Finished Job #13 [13:59:26] Starting job 14,CPU time has been restored to 17957.610000. [13:59:27] Starting new Job [13:59:27] Qink name = fldman [13:59:30] Qink name = gesman [13:59:30] Qink name = scfman [15:09:47] Qink name = anlman [15:16:03] End of Job [15:16:07] Finished Job #14 [15:16:07] Starting job 15,CPU time has been restored to 22405.540000. [15:16:07] Starting new Job [15:16:07] Qink name = fldman [15:16:10] Qink name = gesman [15:16:10] Qink name = scfman Quit requested: Exiting *** glibc detected *** double free or corruption (fasttop): 0x09dce628 *** SIGABRT: abort called Stack trace (14 frames): [0x806fb4b] [0x80dc524] [0xf778b400] [0x80e7954] [0x80fd917] [0x8102ba9] [0x8102f63] [0x80c9615] [0x80738ae] [0x806e4a2] [0x80e82d7] [0x8054637] [0x80dfe86] [0x8048131] Exiting... </stderr_txt> |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2977 Status: Offline Project Badges: |
skgiven, I'm a bit surprised at you (especially scene as you're a dedicated cruncher who's already contributed so much), as you don't provide any clues of your set-up/memory configurations etc., which may give a clue as to why it aborted...
----------------------------------------I do certainly agree though that you should be given some credit, as after all, from [19:50:12] to [15:16:10] the next day, is a huge chunk of time... |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
It's the same system as posted in several other threads, but I was not trying to highlight the error and resolve it, just highlighting another credit run time issue with these WUs.
----------------------------------------We are in effect running 16 tasks, back to back, but if one fails WRT credit systems you get nothing. Not even sure if these are dismissed by researchers or what happens, but if they are it is a complete waste of time, effort, money and resources; something I am concerned about it. The failure rates, low credit and run time would not occur if these tasks were treated as separate tasks by the credit system. I understand that these were brought together to reduce network bandwidth and server overhead, but should these not just be bundled together for sending and receiving, and split up for running? - Kubuntu 10.04 x64, Q6600, 4GB, 300GB free on drive ;) [Edit 1 times, last edit by skgiven at Oct 24, 2010 5:35:01 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It's like making cars in a manufacturing plant, but at the end of the working day the cars in progress just get scrapped I think that is a very good analogy on how the current point system works. If at the end of the 'manufacturing' you don't have a working product (ie invalid), we get nothing and have to start over. Really, the points are for results not for effort, except in the case of Betas where there are points for effort AND results. |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
To keep with the analogy, what's wrong with coming into work the next day and putting the last wheel on the car? It does not need to go to the scrap heap; 15 tasks completed, 1 tasks eroded, task gets re-issued and no acknowledgement of efforts by crunchers acknowledged. Not cool.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
To keep with the analogy, what's wrong with coming into work the next day and putting the last wheel on the car? No use putting the wheel on if the axle is broken. The challenge is that the system does not know what is wrong with the WU only that it failed. Sure it is a loss of time, but that is why it is called research. Sorry but I cannot get onboard the idea of giving crunchers points for effort. We would be rewarding people for returning bad results that have no value and cannot be used by the scientists, including those with unreliable systems. Some people get hit for this decision, but I have no Invalids or Errors in the 23 pages (345 WU) of results still in my log. |
||
|
sk..
Master Cruncher http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif Joined: Mar 22, 2007 Post Count: 2324 Status: Offline Project Badges: |
Forget the points. Just think about doing 15 tasks (6 to 7h) and having them binned because the 16th failed. Makes no sense - just run the 16th again.
|
||
|
|