Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5124 times and has 17 replies Next Thread
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Another segmentation violation, but this time I think I've seen what happened. About all RAM was in use (~100%), Firefox was getting horribly slow and even slower (response times > 1 minute), and there was only one ARP1 running. When I noticed that the ARP1-unit had failed, I quit Firefox and memory usage went down to about 10% (!). So Firedox is a memory hog (what else is new?). Since I have 'only' 8 GB of memory in my computer, I'm planning to install more memory.
For the record:
Result Name: ARP1_ 0003713_ 001_ 0--
<core_client_version>7.16.1</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[07:19:45] INFO: Checkpoint taken at 2018-07-03_06:00:00
[09:43:58] INFO: Checkpoint taken at 2018-07-03_12:00:00
[12:07:48] INFO: Checkpoint taken at 2018-07-03_18:00:00
[13:52:27] INFO: Checkpoint taken at 2018-07-04_00:00:00
[15:57:10] INFO: Checkpoint taken at 2018-07-04_06:00:00
[18:28:47] INFO: Checkpoint taken at 2018-07-04_12:00:00
SIGSEGV: segmentation violation
Stack trace (18 frames):
[0x2d13b72]
[0x2da0400]
[0x1ed9107]
[0x1e9c664]
[0x1e9444a]
[0x1e8997c]
[0x188518c]
[0x1b6f8e2]
[0x135f570]
[0x11f86d4]
[0x5848b7]
[0x584ece]
[0x448f61]
[0x4475c9]
[0x440967]
[0x2eb2344]
[0x2eb25c1]
[0x405466]

Exiting...

</stderr_txt>
]]>
Result Name          OS           AVN Status             Sent Time         Due / Return Time CPUh  Claimed/Gr.
ARP1_0003713_001_2-- Linux Ubuntu - In Progress 12/13/19 18:50:17 12/20/19 18:50:17 0.00 0.0/0.0
ARP1_0003713_001_0-- Linux Fedora 727 Error 12/13/19 00:38:55 12/13/19 18:42:41 13.94 617.9/0.0
ARP1_0003713_001_1-- Linux 727 Pending Validation 12/13/19 00:38:54 12/13/19 20:10:13 19.48 791.7/0.0
[Generated by wcgformat]
[Dec 14, 2019 1:08:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Segmentation violation

"So Firedox is a memory hog"

It's not a guess, it's a fact. Monitor how it's memory footprint keeps growing even when it's minimized. The more tabs with active elements, the worse. Seen it go as bad as 1.5 GB. Switched to a chrome engine based browser and it's not much better, but not as bad... 900MB for 13 tabs with about 36 hours uptime. Read somewhere that MS Edge is also switching, or has switched to chrome(ium) base too.

Just running 1 at the time with app_config.xml control, profile allowed a max of 3 to buffer, given that catching them has sometimes days in-between. Seen occasional ARP spikes up to 1023MB in virtual memory use. 800MB RAM.
[Dec 14, 2019 1:32:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2346
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Another segmentation violation devilish and all wingmen are affected this time, apart from the first wingman, _0 (probably before it started):
Result Name          OS              AVN Status         Sent Time         Due / Return Time CPUh  Claimed/Gr.
ARP1_0020919_001_4-- Linux Ubuntu 727 Error 1/1/20 09:16:31 1/1/20 09:18:35 0.01 0.2/0.0
ARP1_0020919_001_3-- Linux LinuxMint 727 Error 1/1/20 09:13:57 1/1/20 09:16:28 0.00 0.1/0.0
ARP1_0020919_001_2-- Linux Fedora 727 Error 1/1/20 09:11:25 1/1/20 09:13:49 0.00 0.2/0.0
ARP1_0020919_001_1-- Linux 727 Error 1/1/20 09:09:20 1/1/20 09:11:18 0.01 0.1/0.0
ARP1_0020919_001_0-- Linux Debian 727 Server Aborted 1/1/20 09:09:14 1/1/20 11:16:49 0.00 533.2/0.0
[Generated by wcgformat]

They all carry exactly the same error texts in the Result Log. The following one is from …
Result Name: ARP1_ 0020919_ 001_ 2--
<core_client_version>7.16.1</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
SIGSEGV: segmentation violation
Stack trace (19 frames):
[0x2d13b72]
[0x2da0400]
[0x1ed9107]
[0x1e9c664]
[0x1e9444a]
[0x1e8997c]
[0x188518c]
[0x1b6f8e2]
[0x135f570]
[0x11f86d4]
[0x5848b7]
[0x584ece]
[0x584ece]
[0x448f61]
[0x4475c9]
[0x440967]
[0x2eb2344]
[0x2eb25c1]
[0x405466]

Exiting...

</stderr_txt>
]]>

[Jan 1, 2020 12:45:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Segmentation violation

I started seeing these after upgrading to Ubuntu 19.10. I was seeing these on 3 different machines, all of which were at the 19.10 level. After upgrading a fourth machine to 19.10, I started seeing one or two on that machine as well. The 128 thread EPYC server is also getting a lot (about 5 per day) of invalids since the upgrade. Memory is only about 21% utilized (56GB out of 256GB). EPYC server is also running a mix of CPDN (about 33 tasks) and WCG (about 95 tasks, only MCM). Waiting for the last of CPDN jobs to finish to see of those might be contributing to issue due to cache pollution.
[Jan 1, 2020 4:55:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
CurtisNewton
Cruncher
Joined: Feb 24, 2008
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Simliar issues are also reported for the windows clients, see https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,42054

„Access violation“ is the windows wording for „segmentation violation“
In general, segmentation violation does not necessarily mean instable memory. It can also be an access to an unavailale memory location. E.g. a pointer to an already freed memory location or an uninitialized pointer could cause this message, array accesses with unchecked indices or a memory allocation that failed and is not correctly handled, too.
The windows thread also contains an „Illegal Instruction“ message, which means that the text / code sections of the programm, which should be readonly, has been corrupted. Might be a hint to a memory corruption that is not related to hw issues.

Carsten
[Jan 1, 2020 5:58:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Just wanted to give the devs a headsup that I had an error on a machine with plenty of ram and l3 cache, and it looks like at least two other wingmen (-2 and -3) had errors as well. It's name is ARP1_ 0018714_ 001. I'm wingman -1. Hopefully the two wingmen crunching it now (-0 and -4) have better luck. Result log pasted below:

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)</message>
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
SIGSEGV: segmentation violation
Stack trace (19 frames):
[0x2d13b72]
[0x2da0400]
[0x1ed9107]
[0x1e9c664]
[0x1e9444a]
[0x1e8997c]
[0x188518c]
[0x1b6f8e2]
[0x135f570]
[0x11f86d4]
[0x5848b7]
[0x584ece]
[0x584ece]
[0x448f61]
[0x4475c9]
[0x440967]
[0x2eb2344]
[0x2eb25c1]
[0x405466]

Exiting...

</stderr_txt>
]]>
----------------------------------------

[Jan 1, 2020 8:05:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Signal Error SIGSEGV 11/exit code 193 is a classic.

https://boinc.mundayweb.com/wiki/index.php?title=Process_got_signal_11
[Jan 1, 2020 11:14:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
DrMason
Senior Cruncher
Joined: Mar 16, 2007
Post Count: 153
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Segmentation violation

Thanks for the link lavaflow, it confirmed what I suspected. Just wanted to let the devs know that there were at least 3 wingmen on that task reporting errors (and my machine tends to be very, very reliable), so it might be part of a bad batch. They might want to take a look at what went wrong with the task to see if anything needs to be tweaked going forward.

Happy new year, man!
-DrM
----------------------------------------

[Jan 2, 2020 1:26:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread