| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Another segmentation violation, but this time I think I've seen what happened. About all RAM was in use (~100%), Firefox was getting horribly slow and even slower (response times > 1 minute), and there was only one ARP1 running. When I noticed that the ARP1-unit had failed, I quit Firefox and memory usage went down to about 10% (!). So Firedox is a memory hog (what else is new?). Since I have 'only' 8 GB of memory in my computer, I'm planning to install more memory.
For the record: Result Name: ARP1_ 0003713_ 001_ 0-- Result Name OS AVN Status Sent Time Due / Return Time CPUh Claimed/Gr.[Generated by wcgformat] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
"So Firedox is a memory hog"
It's not a guess, it's a fact. Monitor how it's memory footprint keeps growing even when it's minimized. The more tabs with active elements, the worse. Seen it go as bad as 1.5 GB. Switched to a chrome engine based browser and it's not much better, but not as bad... 900MB for 13 tabs with about 36 hours uptime. Read somewhere that MS Edge is also switching, or has switched to chrome(ium) base too. Just running 1 at the time with app_config.xml control, profile allowed a max of 3 to buffer, given that catching them has sometimes days in-between. Seen occasional ARP spikes up to 1023MB in virtual memory use. 800MB RAM. |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
Another segmentation violation
and all wingmen are affected this time, apart from the first wingman, _0 (probably before it started):Result Name OS AVN Status Sent Time Due / Return Time CPUh Claimed/Gr.[Generated by wcgformat] They all carry exactly the same error texts in the Result Log. The following one is from … Result Name: ARP1_ 0020919_ 001_ 2-- |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I started seeing these after upgrading to Ubuntu 19.10. I was seeing these on 3 different machines, all of which were at the 19.10 level. After upgrading a fourth machine to 19.10, I started seeing one or two on that machine as well. The 128 thread EPYC server is also getting a lot (about 5 per day) of invalids since the upgrade. Memory is only about 21% utilized (56GB out of 256GB). EPYC server is also running a mix of CPDN (about 33 tasks) and WCG (about 95 tasks, only MCM). Waiting for the last of CPDN jobs to finish to see of those might be contributing to issue due to cache pollution.
|
||
|
|
CurtisNewton
Cruncher Joined: Feb 24, 2008 Post Count: 25 Status: Offline Project Badges:
|
Simliar issues are also reported for the windows clients, see https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,42054
„Access violation“ is the windows wording for „segmentation violation“ In general, segmentation violation does not necessarily mean instable memory. It can also be an access to an unavailale memory location. E.g. a pointer to an already freed memory location or an uninitialized pointer could cause this message, array accesses with unchecked indices or a memory allocation that failed and is not correctly handled, too. The windows thread also contains an „Illegal Instruction“ message, which means that the text / code sections of the programm, which should be readonly, has been corrupted. Might be a hint to a memory corruption that is not related to hw issues. Carsten |
||
|
|
DrMason
Senior Cruncher Joined: Mar 16, 2007 Post Count: 153 Status: Offline Project Badges:
|
Just wanted to give the devs a headsup that I had an error on a machine with plenty of ram and l3 cache, and it looks like at least two other wingmen (-2 and -3) had errors as well. It's name is ARP1_ 0018714_ 001. I'm wingman -1. Hopefully the two wingmen crunching it now (-0 and -4) have better luck. Result log pasted below:
----------------------------------------<core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63)</message> <stderr_txt> INFO: Initializing INFO: No state to restore. Start from the beginning. Starting WRFMain SIGSEGV: segmentation violation Stack trace (19 frames): [0x2d13b72] [0x2da0400] [0x1ed9107] [0x1e9c664] [0x1e9444a] [0x1e8997c] [0x188518c] [0x1b6f8e2] [0x135f570] [0x11f86d4] [0x5848b7] [0x584ece] [0x584ece] [0x448f61] [0x4475c9] [0x440967] [0x2eb2344] [0x2eb25c1] [0x405466] Exiting... </stderr_txt> ]]> ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Signal Error SIGSEGV 11/exit code 193 is a classic.
https://boinc.mundayweb.com/wiki/index.php?title=Process_got_signal_11 |
||
|
|
DrMason
Senior Cruncher Joined: Mar 16, 2007 Post Count: 153 Status: Offline Project Badges:
|
Thanks for the link lavaflow, it confirmed what I suspected. Just wanted to let the devs know that there were at least 3 wingmen on that task reporting errors (and my machine tends to be very, very reliable), so it might be part of a bad batch. They might want to take a look at what went wrong with the task to see if anything needs to be tweaked going forward.
----------------------------------------Happy new year, man! -DrM ![]() |
||
|
|
|