| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 9
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello,
Yesterday I installed the latest BOINC version 5.10.13 on my Windows XP. Until then I was using UD Agent without problems, working on all available projects. I noticed that now I can't get a HPF2 WU to finish, all WU compute for a few minutes and abort with one of these 2 messages: 1. "Result Log <core_client_version>5.10.13</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Failed to get VersionInfo size: 1812 ERROR:: Exit at: .\dock_structure.cc line:401 </stderr_txt> ]]>" 2. "Result Log <core_client_version>5.10.13</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Failed to get VersionInfo size: 1812 sin_cos_range ERROR: 12328.990 is outside of [-1,+1] range ERROR:: Exit at: .\utility/sin_cos_range.h line:66 </stderr_txt> ]]>" FAAH WUs look fine: " ... Checkpoint complete ________________________________________________________________________________ autodock4: Successful Completion on "World Community Grid device" ________________________________________________________________________________ AutoDock finishing with return code: 0 </stderr_txt> ]]>" For now I set it to get FAAH only work so as not to squander away good CPU time. Please help, Best regards |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
hi Petre_Huica
----------------------------------------Think i'll report to the technicians. Makes no sense, particularly as i've been running 5.10.13 on WXP pro SP2 for quite a while without any issue on HPF2. Can you check in the Result Status page and click on the Work Unit Name to see the list of 19 in the quorum. Can you tell us if any other is showing 'error'. When reporting back, can you post a work unit number like le965_xxxxxxx Standby
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Sekerob,
Regarding HPF2 WUs, I checked and they always end in "Error" status after a few minutes. Here's the first typical result: WU lf123_00047: Result Log <core_client_version>5.10.13</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Failed to get VersionInfo size: 1812 sin_cos_range ERROR: -34.036636 is outside of [-1,+1] range ERROR:: Exit at: .\utility/sin_cos_range.h line:66 </stderr_txt> ]]> The value is arbitrary. Workunit Status Project Name: Human Proteome Folding - Phase 2 Created: 07/31/2007 13:36:28 Name: lf123_00047 Minimum Quorum: 15 Initial Replication: 19 Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit lf123_ 00047_ 19-- Valid 08/02/2007 00:09:12 08/02/2007 08:36:24 8.36 58.0 / 58.7 lf123_ 00047_ 9-- Valid 08/01/2007 13:42:51 08/01/2007 21:56:52 3.05 52.6 / 58.7 lf123_ 00047_ 8-- Valid 08/01/2007 13:42:41 08/02/2007 11:34:33 4.46 66.8 / 58.7 lf123_ 00047_ 1-- Valid 08/01/2007 13:42:37 08/02/2007 20:16:51 7.96 66.3 / 58.7 lf123_ 00047_ 5-- In Progress 08/01/2007 13:42:03 08/12/2007 13:42:03 0.00 0.0 / 0.0 lf123_ 00047_ 13-- Valid 08/01/2007 13:41:22 08/02/2007 07:38:39 6.24 46.7 / 58.7 lf123_ 00047_ 3-- Valid 08/01/2007 13:41:16 08/02/2007 00:28:03 8.72 61.3 / 58.7 lf123_ 00047_ 11-- Valid 08/01/2007 13:41:00 08/02/2007 05:12:15 5.18 40.9 / 58.7 lf123_ 00047_ 2-- Valid 08/01/2007 13:40:50 08/02/2007 17:26:37 4.03 51.7 / 58.7 lf123_ 00047_ 10-- Valid 08/01/2007 13:39:41 08/02/2007 04:53:03 9.97 43.7 / 58.7 lf123_ 00047_ 17-- In Progress 08/01/2007 13:39:33 08/12/2007 13:39:33 0.00 0.0 / 0.0 lf123_ 00047_ 6-- Valid 08/01/2007 13:38:51 08/02/2007 10:25:06 6.03 55.7 / 58.7 lf123_ 00047_ 0-- Error 08/01/2007 13:38:45 08/02/2007 00:06:47 0.15 1.4 / 0.0 lf123_ 00047_ 12-- Valid 08/01/2007 13:38:42 08/03/2007 15:52:58 9.38 76.8 / 58.7 lf123_ 00047_ 16-- In Progress 08/01/2007 13:38:35 08/12/2007 13:38:35 0.00 0.0 / 0.0 lf123_ 00047_ 7-- Valid 08/01/2007 13:38:35 08/02/2007 05:57:12 6.10 65.6 / 58.7 lf123_ 00047_ 4-- Valid 08/01/2007 13:38:08 08/02/2007 11:53:51 7.70 61.3 / 58.7 lf123_ 00047_ 15-- Valid 08/01/2007 13:37:52 08/02/2007 03:22:01 3.55 62.1 / 58.7 lf123_ 00047_ 14-- Valid 08/01/2007 13:37:29 08/01/2007 23:38:27 5.02 47.5 / 58.7 lf123_ 00047_ 18-- Valid 08/01/2007 13:37:12 08/02/2007 10:22:07 5.67 56.0 / 58.7 I'm the only one with "Error" and this is consistent with my other HPF2 results, bar some occasional "Error"s from others which are rare enough (1, max 2 per WU). But I'm always "erring", which is strange. Here's the second typical result: WU lf141_00023: Result Log <core_client_version>5.10.13</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Failed to get VersionInfo size: 1812 ERROR:: Exit at: .\dock_structure.cc line:401 </stderr_txt> ]]> Workunit Status Workunit Status Project Name: Human Proteome Folding - Phase 2 Created: 07/31/2007 20:11:09 Name: lf141_00023 Minimum Quorum: 15 Initial Replication: 19 Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit lf141_ 00023_ 19-- Valid 08/02/2007 00:38:20 08/02/2007 05:45:01 3.78 46.7 / 53.4 lf141_ 00023_ 15-- Valid 08/02/2007 00:25:50 08/02/2007 10:00:27 4.31 41.1 / 53.4 lf141_ 00023_ 0-- Valid 08/02/2007 00:25:28 08/02/2007 06:18:30 3.87 57.2 / 53.4 lf141_ 00023_ 10-- Valid 08/02/2007 00:24:37 08/03/2007 12:14:12 6.39 55.6 / 53.4 lf141_ 00023_ 13-- Valid 08/02/2007 00:24:02 08/02/2007 21:42:41 8.04 66.8 / 53.4 lf141_ 00023_ 6-- Error 08/02/2007 00:23:43 08/02/2007 00:34:08 0.06 0.5 / 0.0 lf141_ 00023_ 1-- Valid 08/02/2007 00:23:22 08/02/2007 13:28:49 4.62 42.6 / 53.4 lf141_ 00023_ 2-- Valid 08/02/2007 00:22:56 08/02/2007 08:51:43 5.06 45.2 / 53.4 lf141_ 00023_ 8-- In Progress 08/02/2007 00:21:48 08/13/2007 00:21:48 0.00 0.0 / 0.0 lf141_ 00023_ 11-- Valid 08/02/2007 00:21:43 08/03/2007 13:39:51 10.57 61.5 / 53.4 lf141_ 00023_ 16-- Valid 08/02/2007 00:21:36 08/02/2007 10:20:35 4.63 48.3 / 53.4 lf141_ 00023_ 17-- Valid 08/02/2007 00:20:58 08/02/2007 19:34:38 4.69 54.6 / 53.4 lf141_ 00023_ 12-- Valid 08/02/2007 00:20:02 08/03/2007 09:48:07 8.42 49.8 / 53.4 lf141_ 00023_ 5-- Valid 08/02/2007 00:19:48 08/02/2007 15:42:27 9.35 63.2 / 53.4 lf141_ 00023_ 3-- Valid 08/02/2007 00:18:35 08/03/2007 03:07:33 6.68 66.7 / 53.4 lf141_ 00023_ 4-- In Progress 08/02/2007 00:18:08 08/13/2007 00:18:08 0.00 0.0 / 0.0 lf141_ 00023_ 9-- Valid 08/02/2007 00:15:26 08/03/2007 12:00:16 28.59 58.2 / 53.4 lf141_ 00023_ 18-- Valid 08/02/2007 00:14:58 08/02/2007 14:17:53 4.81 41.9 / 53.4 lf141_ 00023_ 14-- Valid 08/02/2007 00:14:01 08/03/2007 05:06:17 5.06 56.8 / 53.4 lf141_ 00023_ 7-- Valid 08/02/2007 00:13:32 08/02/2007 11:15:43 4.39 54.1 / 53.4 Like the first one, I'm the only one with "Error" . I also checked the FAAH WUs in more detail and unlike what I said in the first place, they're not fine at all either. Here I get either "Invalid" or "Inconclusive" status, even if the result log always ends with: " ... Checkpoint complete ________________________________________________________________________________ autodock4: Successful Completion on "World Community Grid device" ________________________________________________________________________________ AutoDock finishing with return code: 0 </stderr_txt> ]]>" One thing that may be worth mentioning is this: my device is a laptop. We had some hot weather in july and I installed NHC (2.0 pre-release 6, 14.05.2007) in a bid to lower my cpu temp by undervolting. I did it by the book, gradually working to the sweet spot between stability and coolness. I tested each step extensively for stability exactly because I was crunching and I didn't want to give out bad results. At the time I was running UD Agent, with no apparent errors. However I don't know how (or if it's possible) to see the result logs in UD, so I can't make a direct comparison with the current situation (running BOINC), but maybe this holds for something. Petre |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This looks like a clear hardware error.
Sadly, UD does not report errors or invalid work, so this has probably been going on for some time. I can't recommend undervolting your CPU. Start by resetting everything to the factory defaults, then you can consider alternative ways of cooling. Underclocking is safer than undervolting. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The exact same thing happened to me.
----------------------------------------You've lowered the voltage too far. The NHC testing doesn't put much stress on the cpu. You can use prime95 to test for proper voltage, if it fails with an error, hpf2 will fail also. [Edit 1 times, last edit by Former Member at Aug 5, 2007 2:46:32 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Didactylos: thank you for the info, I will go up in voltage until I get good results
Questar: what are your system specs, what was your default voltage and how far were you able to go down in voltage and still be 100% stable (ie. get good crunch results)? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Didactylos: thank you for the info, I will go up in voltage until I get good results Questar: what are your system specs, what was your default voltage and how far were you able to go down in voltage and still be 100% stable (ie. get good crunch results)? What is stable is going to differ from system to system. Once I found the problem I didn't do a lot of testing to see just how low I could go, I just picked something that worked. I'm running a 1.7Ghz Pentium M, with these settings. 6x 0.732v 8x 0.812v 10x 1.084 12x and above 1.132v [Edit 1 times, last edit by Former Member at Aug 6, 2007 11:32:56 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have a 1.8Ghz Pentium M and I only worry about the maximum multiplier (18x), because the cpu is always maxed out crunching. I used p95 and I had to go from 1.052v up to 1.116v to be 100% stable (18h + no errors). I'm satisfied with this and now I'm waiting for BOINC results to come up.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Valid BOINC results are pouring down, HPF2 and FAAH alike
![]() Problem solved, thx for your help. |
||
|
|
|