Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5121 times and has 17 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Segmentation violation

Result Name: ARP1_ 0014879_ 000_ 0--
<core_client_version>7.16.1</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[15:45:03] INFO: Checkpoint taken at 2018-07-01_06:00:00
SIGSEGV: segmentation violation
Stack trace (18 frames):
[0x2d13b72]
[0x2da0400]
[0x1ed9107]
[0x1e9c664]
[0x1e9444a]
[0x1e8997c]
[0x188518c]
[0x1b6f8e2]
[0x135f570]
[0x11f86d4]
[0x5848b7]
[0x584ece]
[0x448f61]
[0x4475c9]
[0x440967]
[0x2eb2344]
[0x2eb25c1]
[0x405466]

Exiting...

</stderr_txt>
]]>

Result Name          OS              AVN Status       Sent Time         Due / Return Time CPUh  Claimed/Granted
ARP1_0014879_000_4-- Linux 727 Valid 11/19/19 06:41:28 11/19/19 22:06:41 14.08 935.7/1,005.2
ARP1_0014879_000_3-- Linux Ubuntu - No Reply 11/16/19 19:52:40 11/19/19 06:40:40 0.00 0.0/0.0
ARP1_0014879_000_2-- Linux LinuxMint 727 Valid 11/10/19 15:52:16 11/12/19 10:29:57 37.68 1,074.7/1,005.2
ARP1_0014879_000_0-- Linux Fedora 727 Error 11/10/19 11:38:19 11/10/19 15:52:11 2.72 123.4/0.0
ARP1_0014879_000_1-- Linux LinuxMint 727 User Aborted 11/10/19 11:38:18 11/16/19 19:52:31 51.06 1,077.2/0.0
[Generated by wcgformat]

EDIT 14-11-2019: One wingman's workunit is Pending Validation
EDIT 17-11-2019: First wingman's workunit was User Aborted
EDIT 19-11-2019: Third wingman didn't reply
EDIT 19-11-2019: Fourth wingman's workunit got validated
----------------------------------------
[Edit 4 times, last edit by adriverhoef at Nov 19, 2019 10:15:30 PM]
[Nov 10, 2019 4:12:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

How did it behave? Is this the Checkpoint Reset Bug???
----------------------------------------

...KRI please cancel all shadow-banning
[Nov 10, 2019 5:25:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
floyd
Cruncher
Joined: May 28, 2016
Post Count: 47
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Is this the Checkpoint Reset Bug???
Of course it is. The evil Checkpoint Reset Bug is everywhere. Don't forget to look under your bed before you go to sleep. rolling eyes
[Nov 10, 2019 6:21:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

How did it behave? Is this the Checkpoint Reset Bug???

First, let's have a look at the logfile:

Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879.input
Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879_input_d01
Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879_input_d02
Sun 10 Nov 2019 12:38:22 CET | World Community Grid | Started download of ARP1_0014879_000_ARP1_0014879_input_d03
Sun 10 Nov 2019 12:38:25 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879.input
Sun 10 Nov 2019 12:38:29 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879_input_d01
Sun 10 Nov 2019 12:38:31 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879_input_d03
Sun 10 Nov 2019 12:38:32 CET | World Community Grid | Finished download of ARP1_0014879_000_ARP1_0014879_input_d02
Sun 10 Nov 2019 12:56:05 CET | World Community Grid | task ARP1_0014879_000_0 suspended by user
Sun 10 Nov 2019 12:56:07 CET | World Community Grid | task ARP1_0014879_000_0 resumed by user
Sun 10 Nov 2019 13:36:08 CET | World Community Grid | task ARP1_0014879_000_0 suspended by user
Sun 10 Nov 2019 13:36:10 CET | World Community Grid | task ARP1_0014879_000_0 resumed by user
Sun 10 Nov 2019 13:56:06 CET | World Community Grid | Starting task ARP1_0014879_000_0
Sun 10 Nov 2019 13:56:10 CET | World Community Grid | task ARP1_0014879_000_0 suspended by user
Sun 10 Nov 2019 13:56:12 CET | World Community Grid | task ARP1_0014879_000_0 resumed by user
Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Computation for task ARP1_0014879_000_0 finished
Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_0 for task ARP1_0014879_000_0 absent
Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_1 for task ARP1_0014879_000_0 absent
Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_2 for task ARP1_0014879_000_0 absent
Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_3 for task ARP1_0014879_000_0 absent
Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_4 for task ARP1_0014879_000_0 absent
Sun 10 Nov 2019 16:47:40 CET | World Community Grid | Output file ARP1_0014879_000_0_r1485167504_5 for task ARP1_0014879_000_0 absent
Sun 10 Nov 2019 16:52:08 CET | World Community Grid | Reporting 1 completed tasks


Starting at 13:56 local time, after 2 hours and 51 minutes the task quits at 16:47 local time, see above, while the first checkpoint (after 1 hour and 49 minutes) is at 15:45 local time (remember "[15:45:03] INFO: Checkpoint taken at 2018-07-01_06:00:00" from the initial posting). So the task quits 1 hour and 2 minutes after the first checkpoint [15:45:03→16:47:40].

So if you think that this is the infamous Checkpoint Reset Bug, where can I find more information about it?
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Nov 10, 2019 6:56:38 PM]
[Nov 10, 2019 6:56:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Aurum's referencing an issue that he had with using third party software. BoincTasks's "suspend after checkpoint" function apparently wasn't working quite right just after launch. By timeslicing with another project, he lost some work because the units reset to the last checkpoint. No diagnostic info provided for that, but that could happen if the application data wasn't stored in memory when the other units were crunching. Finally, he seems to have alleged an issue where the sixth checkpoint was not created on a task (at 75%, a sixth checkpoint should be created, because a checkpoint should be created every 12.5%), but I don't remember any data on that task, or comparative data, actually being posted. If he can post data, I'm sure the techs would love to see it so the issue could be identified. However, there's a chance whatever third party software he's running may be interfering. I don't remember anyone else having these problems running any sort of vanilla BOINC installation.

There's also general complaints arising from work lost because an unexpected shut down occurred and work was reset to a previous checkpoint. But that's not a bug; it's just the limitations of the software and the models being used.

Can't say for sure, but your error doesn't look like it has anything to do with the 1st checkpoint. Not sure what a segmentation violation is per se, but hopefully someone can look at it and make sure it was a one-off error and not something systematic. Hope that answered one of your questions! smile
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by DrMason at Nov 11, 2019 10:03:47 AM]
[Nov 11, 2019 9:57:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Aurum's referencing an issue that he had with using third party software.
Oh, I don't use BoincTasks at all.
Hope that answered one of your questions! smile
It did. Thanks, DrMason!
[Nov 11, 2019 12:06:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jim1348
Veteran Cruncher
USA
Joined: Jul 13, 2009
Post Count: 1066
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

I suspect that "segmentation violation" is due to hardware, and memory in particular. It doesn't mean that it is bad, only unstable. I sometimes see it when four memory modules are used. Yesterday, I just pulled out two 16 GB modules and left two in place, since I was seeing that error once in a while on CPDN, where I did not see it on any of my other machines. I think that will fix it.

The speed rating of memory modules is for only two by the way. You take your chances with four. You might need to reduce the speed to the motherboard default (probably 2133 MHz these days) in order to use four reliably.
[Nov 11, 2019 12:46:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Is this the Checkpoint Reset Bug???
Of course it is. The evil Checkpoint Reset Bug is everywhere. Don't forget to look under your bed before you go to sleep. rolling eyes

We need the ability to block those that make ad hominem attacks.
----------------------------------------

...KRI please cancel all shadow-banning
[Nov 11, 2019 4:26:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Aurum
Master Cruncher
The Great Basin
Joined: Dec 24, 2017
Post Count: 2391
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Aurum's referencing an issue that he had with using third party software. BoincTasks's "suspend after checkpoint" function apparently wasn't working quite right just after launch. By timeslicing with another project, he lost some work because the units reset to the last checkpoint. No diagnostic info provided for that, but that could happen if the application data wasn't stored in memory when the other units were crunching.
1. Not third party software but BOINC 7.9.3 & 7.14.2.
2. Not using "Suspend at Checkpoint" from BoincTasks.
3. Using timeslicing because ARP1 WUs can only be supplied for less than 20% of CPU threads. I added Universe@home and even with "leave application in memory" checked for every computer it dropped back to the last checkpoint when BOINC switched from WCG leading to Universe. Since there are not enough ARP1 WUs to occupy every CPU thread it adds BHspin2 to fill out the list. When the lead switches to Universe it's all BHspin2 WUs. When it switches back to WCG leading all ARP1 WUs restart at their last checkpoint. This does not happen when ARP1 is filled out with HST1, MIP1, and MCM1 because it does not seem to switch ARP1 WUs off.
Finally, he seems to have alleged an issue where the sixth checkpoint was not created on a task (at 75%, a sixth checkpoint should be created, because a checkpoint should be created every 12.5%), but I don't remember any data on that task, or comparative data, actually being posted. If he can post data, I'm sure the techs would love to see it so the issue could be identified. However, there's a chance whatever third party software he's running may be interfering. I don't remember anyone else having these problems running any sort of vanilla BOINC installation.
4. This is from the test of BoincTasks 1.80's Suspend at Checkpoint which does not work. Others confirmed that it may never have worked.
----------------------------------------

...KRI please cancel all shadow-banning
[Nov 11, 2019 4:44:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

----------------------------------------
[Edit 1 times, last edit by hchc at Dec 20, 2019 11:50:25 AM]
[Nov 11, 2019 5:58:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread