| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Recently Active Project Badges:
|
Result Name: ARP1_ 0014879_ 000_ 0--
----------------------------------------<core_client_version>7.16.1</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> INFO: Initializing INFO: No state to restore. Start from the beginning. Starting WRFMain [15:45:03] INFO: Checkpoint taken at 2018-07-01_06:00:00 SIGSEGV: segmentation violation Stack trace (18 frames): [0x2d13b72] [0x2da0400] [0x1ed9107] [0x1e9c664] [0x1e9444a] [0x1e8997c] [0x188518c] [0x1b6f8e2] [0x135f570] [0x11f86d4] [0x5848b7] [0x584ece] [0x448f61] [0x4475c9] [0x440967] [0x2eb2344] [0x2eb25c1] [0x405466] Exiting... </stderr_txt> ]]> Result Name OS AVN Status Sent Time Due / Return Time CPUh Claimed/Granted[Generated by wcgformat]EDIT 14-11-2019: One wingman's workunit is Pending Validation EDIT 17-11-2019: First wingman's workunit was User Aborted EDIT 19-11-2019: Third wingman didn't reply EDIT 19-11-2019: Fourth wingman's workunit got validated [Edit 4 times, last edit by adriverhoef at Nov 19, 2019 10:15:30 PM] |
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
How did it behave? Is this the Checkpoint Reset Bug???
----------------------------------------![]() ...KRI please cancel all shadow-banning |
||
|
|
floyd
Cruncher Joined: May 28, 2016 Post Count: 47 Status: Offline Project Badges:
|
Is this the Checkpoint Reset Bug??? Of course it is. The evil Checkpoint Reset Bug is everywhere. Don't forget to look under your bed before you go to sleep. ![]() |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Recently Active Project Badges:
|
How did it behave? Is this the Checkpoint Reset Bug??? First, let's have a look at the logfile: Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879.input Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879_input_d01 Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879_input_d02 Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879_input_d03 Sun 10 Nov 2019 12:38:25 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879.input Sun 10 Nov 2019 12:38:29 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879_input_d01 Sun 10 Nov 2019 12:38:31 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879_input_d03 Sun 10 Nov 2019 12:38:32 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879_input_d02 Sun 10 Nov 2019 12:56:05 CET | World Community Grid | task ARP1_0014879_000_0 suspended by user Sun 10 Nov 2019 12:56:07 CET | World Community Grid | task ARP1_0014879_000_0 resumed by user Sun 10 Nov 2019 13:36:08 CET | World Community Grid | task ARP1_0014879_000_0 suspended by user Sun 10 Nov 2019 13:36:10 CET | World Community Grid | task ARP1_0014879_000_0 resumed by user Sun 10 Nov 2019 13:56:06 CET | World Community Grid | Starting task ARP1_0014879_000_0 Sun 10 Nov 2019 13:56:10 CET | World Community Grid | task ARP1_0014879_000_0 suspended by user Sun 10 Nov 2019 13:56:12 CET | World Community Grid | task ARP1_0014879_000_0 resumed by user Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Computation for task ARP1_0014879_000_0 finished Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_0 for task ARP1_0014879_000_0 absent Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_1 for task ARP1_0014879_000_0 absent Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_2 for task ARP1_0014879_000_0 absent Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_3 for task ARP1_0014879_000_0 absent Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_4 for task ARP1_0014879_000_0 absent Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_5 for task ARP1_0014879_000_0 absent Sun 10 Nov 2019 16:52:08 CET | World Community Grid | Reporting 1 completed tasks Starting at 13:56 local time, after 2 hours and 51 minutes the task quits at 16:47 local time, see above, while the first checkpoint (after 1 hour and 49 minutes) is at 15:45 local time (remember "[15:45:03] INFO: Checkpoint taken at 2018-07-01_06:00:00" from the initial posting). So the task quits 1 hour and 2 minutes after the first checkpoint [15:45:03→16:47:40]. So if you think that this is the infamous Checkpoint Reset Bug, where can I find more information about it? [Edit 1 times, last edit by adriverhoef at Nov 10, 2019 6:56:38 PM] |
||
|
|
DrMason
Senior Cruncher Joined: Mar 16, 2007 Post Count: 153 Status: Offline Project Badges:
|
Aurum's referencing an issue that he had with using third party software. BoincTasks's "suspend after checkpoint" function apparently wasn't working quite right just after launch. By timeslicing with another project, he lost some work because the units reset to the last checkpoint. No diagnostic info provided for that, but that could happen if the application data wasn't stored in memory when the other units were crunching. Finally, he seems to have alleged an issue where the sixth checkpoint was not created on a task (at 75%, a sixth checkpoint should be created, because a checkpoint should be created every 12.5%), but I don't remember any data on that task, or comparative data, actually being posted. If he can post data, I'm sure the techs would love to see it so the issue could be identified. However, there's a chance whatever third party software he's running may be interfering. I don't remember anyone else having these problems running any sort of vanilla BOINC installation.
----------------------------------------There's also general complaints arising from work lost because an unexpected shut down occurred and work was reset to a previous checkpoint. But that's not a bug; it's just the limitations of the software and the models being used. Can't say for sure, but your error doesn't look like it has anything to do with the 1st checkpoint. Not sure what a segmentation violation is per se, but hopefully someone can look at it and make sure it was a one-off error and not something systematic. Hope that answered one of your questions! ![]() ![]() [Edit 1 times, last edit by DrMason at Nov 11, 2019 10:03:47 AM] |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Recently Active Project Badges:
|
Aurum's referencing an issue that he had with using third party software. Oh, I don't use BoincTasks at all.Hope that answered one of your questions! It did. Thanks, DrMason!![]() |
||
|
|
Jim1348
Veteran Cruncher USA Joined: Jul 13, 2009 Post Count: 1066 Status: Offline Project Badges:
|
I suspect that "segmentation violation" is due to hardware, and memory in particular. It doesn't mean that it is bad, only unstable. I sometimes see it when four memory modules are used. Yesterday, I just pulled out two 16 GB modules and left two in place, since I was seeing that error once in a while on CPDN, where I did not see it on any of my other machines. I think that will fix it.
The speed rating of memory modules is for only two by the way. You take your chances with four. You might need to reduce the speed to the motherboard default (probably 2133 MHz these days) in order to use four reliably. |
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
Is this the Checkpoint Reset Bug??? Of course it is. The evil Checkpoint Reset Bug is everywhere. Don't forget to look under your bed before you go to sleep. ![]() We need the ability to block those that make ad hominem attacks. ![]() ...KRI please cancel all shadow-banning |
||
|
|
Aurum
Master Cruncher The Great Basin Joined: Dec 24, 2017 Post Count: 2391 Status: Offline Project Badges:
|
Aurum's referencing an issue that he had with using third party software. BoincTasks's "suspend after checkpoint" function apparently wasn't working quite right just after launch. By timeslicing with another project, he lost some work because the units reset to the last checkpoint. No diagnostic info provided for that, but that could happen if the application data wasn't stored in memory when the other units were crunching. 1. Not third party software but BOINC 7.9.3 & 7.14.2.2. Not using "Suspend at Checkpoint" from BoincTasks. 3. Using timeslicing because ARP1 WUs can only be supplied for less than 20% of CPU threads. I added Universe@home and even with "leave application in memory" checked for every computer it dropped back to the last checkpoint when BOINC switched from WCG leading to Universe. Since there are not enough ARP1 WUs to occupy every CPU thread it adds BHspin2 to fill out the list. When the lead switches to Universe it's all BHspin2 WUs. When it switches back to WCG leading all ARP1 WUs restart at their last checkpoint. This does not happen when ARP1 is filled out with HST1, MIP1, and MCM1 because it does not seem to switch ARP1 WUs off. Finally, he seems to have alleged an issue where the sixth checkpoint was not created on a task (at 75%, a sixth checkpoint should be created, because a checkpoint should be created every 12.5%), but I don't remember any data on that task, or comparative data, actually being posted. If he can post data, I'm sure the techs would love to see it so the issue could be identified. However, there's a chance whatever third party software he's running may be interfering. I don't remember anyone else having these problems running any sort of vanilla BOINC installation. 4. This is from the test of BoincTasks 1.80's Suspend at Checkpoint which does not work. Others confirmed that it may never have worked.![]() ...KRI please cancel all shadow-banning |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
.
----------------------------------------
[Edit 1 times, last edit by hchc at Dec 20, 2019 11:50:25 AM] |
||
|
|
|