| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Greetings!!
----------------------------------------From 20 to 25 December, 2020, 15 MIP WU failed with a singal 11 for me. In the same time period 11 MIP WU were valid. African Rain, Covid, Help Stop TB WU were also Valid. I have plenty of memory... Running One Einstein@Home GPU and 7 WCG tasks, the PC currently uses 3.3GiB of memory of 11.6 GiB. (28% with all OS) Anyone else seeing this ??? Or, should I run memory diags for a day? Other: Kernel Linux 5.8.0-33-generic x86_64 Ubuntu Mate Release 20.10 (Groovy Gorilla) 64-bit Processor: AMD FX(tm)-8150 Eight-Core Processor × 8 BOINC Log - starttup - within syslog... Dec 19 10:27:09 pc-14 boinc[1271]: 19-Dec-2020 10:27:09 [---] Starting BOINC client version 7.16.11 for x86_64-pc-linux-gnu So... Anyone else seeing a SIGSEV? Thanks!! Jay PS sample WU with error: https://www.worldcommunitygrid.org/ms/device/...s.do?workunitId=452568867 and https://www.worldcommunitygrid.org/ms/device/...og.do?resultId=1391588198 and Result Log Result Name: MIP1_ 00327420_ 7452_ 0-- <core_client_version>7.16.11</core_client_version> <![CDATA[ <message> process got signal 11</message> <stderr_txt> [2020-12-25 0:43:29:] :: BOINC:: Initializing ... ok. [2020-12-25 0:43:29:] :: BOINC :: boinc_init() INFO: result number = 0 BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: ../../projects/www.worldcommunitygrid.org/wcgrid_mip1_rosetta_7.16_x86_64-pc-linux-gnu -in::file::zip MIP1_databasev2.zip @./MIP1_00327420.flags -out::file::silent result_silent.out -run:jran 937053714 -nstruct 1 -out::level 100 -run::no_scorefile true Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/www.worldcommunitygrid.org/mip1.MIP1_databasev2.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... set_shared_memory_fully_initialized ... abrelax ... abrelax.run Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Sequence Length = 345 Starting work on structure: _0001 </stderr_txt> ]]> PPS Merry Christmas ![]() [Edit 1 times, last edit by jay_Orlando at Dec 25, 2020 7:37:30 AM] |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7848 Status: Offline Project Badges:
|
How many MIP work units are you running at the same time ?
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Hey Joe,
----------------------------------------Happy Boxing day!! In response to ypur post, First, are you seeing the problem of signal 11 on MIP WU?? Second, the number of WU varies for MIP as WCG fills requests. Right now only two. Why? I would like to understand your line of thought. I have run mem dignostics for about 10 hours - ne errors. I have turned on memory logging. Turning on RPC logging destroys my BOINC manager display. Does this destroy your display?. Please check. ![]() |
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Update.
----------------------------------------26December 2020, 28 failures on MIP between nidnight and 1PM. Will stop MIP for a while. No errors on other projects. Reinstalled BOIN and Libraries - still errors. Anyone else seeing this? I have two machines with Ubuntu Linux. One has the errors , the other does not. ubuntu 20.04.1 LTS BOINC 7.16.6+dfsg-1 --- No errors on MIP or others ubuntu 20.10 BOINC 7.16.15.11+dfsg-1 --- errors (signal 11) Anyone else with ubuntu 20.10 AND BOINC 7.16.15.11+dfsg-1 having sig 11?? Thanks Jay PS will also check ubuntu forum. ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7848 Status: Offline Project Badges:
|
Why? I would like to understand your line of thought. I had some signal 11 problems years ago, but never did figure a cause. I had a theory there was a traffic jam on access to either memory, cpu or hard drive. I was partial to the memory access theory, but I have no way to tell for sure. I haven't seen this on any of my Linux systems for several years. Edit: By the way what are the cpus in your systems and how much memory ? Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Dec 26, 2020 8:49:09 PM] |
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Another person reported this problem here:
----------------------------------------https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,41744 Apparently the problems run with AMD. Maybe it was a compiler-flag when generating the Debian/Ubuntu package. My workaround (for now) is to not to run MIP on the AMD machine. Joe, The info you want was listed in the original post above with the BOINC log startup. :-) Cheers, jay ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7848 Status: Offline Project Badges:
|
Joe, The info you want was listed in the original post above with the BOINC log startup. Sorry for not reading the OP. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Greetings,
----------------------------------------Is there any debug that would help? I've tried the flags - but nothing at the time of signal 11. Does the project want a volunteer? Jay ![]() |
||
|
|
Brian Nixon
Cruncher United Kingdom Joined: Oct 27, 2020 Post Count: 9 Status: Offline Project Badges:
|
There have been reports of unexplained failures like this over at Rosetta@home, too – particularly with Linux on AMD. It seems like the kind of obscure bug that will be effectively impossible to track down without a debug build and the Rosetta source code.
|
||
|
|
jay_Orlando
Senior Cruncher USA Joined: Jan 4, 2006 Post Count: 189 Status: Offline Project Badges:
|
Brian,
----------------------------------------Thanks for the info!! I added some packages and slowed down the memory accesses as per Mxd1 in the MIP forum. No luck. I agree; probably something obscure like an instruction fault. Oh well, I set another machine on a different venue working 100% MIPS. T H A N K S again, Jay ![]() |
||
|
|
|