Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Discovering Dengue Drugs - Together - Phase 2 Forum Thread: DDDT2: *** glibc detected *** double free or corruption (fasttop): |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Never looked in message log until last night and got the warning/error? in topic in ALL ts01 results since the 20th, whilst they ALL validate (Linux 10.10 kernel 2.6.35.27), but other sciences don't show this (HCMD2/CEP2/C4CW). Nearly ALL because the message plus a limited stack-dump appear in the result log, sample:
----------------------------------------Result Name: ts01_ a023_ pr23b0_ 0-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. called boinc_finish *** glibc detected *** double free or corruption (fasttop): 0x1ec2da10 *** SIGABRT: abort called Stack trace (20 frames): [0x86d160f] [0x873aeb0] [0xf77a6400] [0x87442e4] [0x875a58b] [0x875f81d] [0x875fbd7] [0x87299e5] [0x86d52f6] [0x86cff5b] [0x8744cc7] [0x86cda8c] [0x86cdb9b] [0x8051111] [0x83f31aa] [0x84412ce] [0x86adc03] [0x804f4d3] [0x873cfaa] [0x8048131] Exiting... </stderr_txt> ]]> The hex codes are not equal from result to result. Did nose around the web and found one discussing this as an "old Linux error" and a command to issue to suppress this: export MALLOC_CHECK_=0 Done that and now the wait is on if it goes away, but at any rate, if anyone sees this, at least here 99% validates immediately. 2 Sit in the Inconclusive cycle and waiting and second wingman. And seen a few days ago 1 like that, that validated, at least no invalids are listed in the 6 pages of completed tasks (99) that are DDDT2 ts01 for this device. As noted above, this started on the second ts01 returned on the 20th and heard no screams, so it's likely just FYI, and it's for DDDT2 only! References found: On glibc detected: http://ubuntuforums.org/showthread.php?t=175050 On export ... : http://www.google.com/search?client=ubuntu&am...amp;ie=utf-8&oe=utf-8 description: Recent versions of Linux libc (later than 5.4.23) and GNU libc (2.x) include a malloc implementation which is tunable via environment variables. When MALLOC_CHECK_ is set, a special (less efficient) implementation is used which is designed to be tolerant against simple errors, such as double calls of free() with the same argument, or overruns of a single byte (off-by-one bugs). Not all such errors can be protected against, however, and memory leaks can result. If MALLOC_CHECK_ is set to 0, any detected heap corruption is silently ignored and an error message is not generated; if set to 1, the error message is printed on stderr, but the program is not aborted; if set to 2, abort() is called immediately, but the error message is not generated; if set to 3, the error message is printed on stderr and program is aborted. This can be useful because otherwise a crash may happen much later, and the true cause for the problem is then very hard to track down. Guess from reading this that = 1 or 3 was set. ---//-- edit: Well the first result is in after a set, a client restar and a boot and the log now looks to only show two additional restarts, which is correct: Result Log Result Name: ts01_ a094_ pqb009_ 1-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. Calling gridPlatform.init() Copying wcgrestart.rst called boinc_finish </stderr_txt> ]]> edit 2: setting this variable of export MALLOC_Check_=0 is possibly impacting performance, so will watch efficiencies, running generally for this science at 99.8%. *** glibc detected *** double free or corruption: 0x0937d008 *** By default, the program that generated this error will also be killed. However, this (and whether or not an error message is generated) can be controlled via the MALLOC_CHECK_ environment variable. The following settings are supported: 0 — Do not generate an error message, and do not kill the program 1 — Generate an error message, but do not kill the program 2 — Do not generate an error message, but kill the program 3 — Generate an error message and kill the program Note, if MALLOC_CHECK_ is explicitly set a value other than 0, this causes glibc to perform more tests that are more extensive than the default, and may impact performance. edit 3: If the techs could please chip in... really benign or something else to fix?? [Edit 3 times, last edit by Former Member at Feb 27, 2011 10:29:24 AM] |
||
|
bieberj
Senior Cruncher United States Joined: Dec 2, 2004 Post Count: 406 Status: Offline Project Badges: |
Looks like the source code needs to be examined to find the instance of a double free.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Whatever is up with this, the MALLOC_CHECK_=0 suppresses all the occurrences on DDDT2 except for the 'sr' type. As per the documentation this had performance impact and indeed, the efficiencies dropped about 0.2%. Given that there continues to be 100% valid declaration, to include on pairings that turn inconclusive initially, the original wingman in these instances being declared invalid, took the envar off again and indeed it's back to 99.8%. I'll let it be until told otherwise as how to ''invisibilize'' this in the logs.
----------------------------------------Sample 'sr' log: Result Log Result Name: ts01_ a138_ sr56b1_ 1-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. called boinc_finish *** glibc detected *** double free or corruption (fasttop): 0x1eeb6a10 *** SIGABRT: abort called Stack trace (20 frames): [0x86d160f] [0x873aeb0] [0xf774f400] [0x87442e4] [0x875a58b] [0x875f81d] [0x875fbd7] [0x87299e5] [0x86d52f6] [0x86cff5b] [0x8744cc7] [0x86cda8c] [0x86cdb9b] [0x8051111] [0x83f31aa] [0x84412ce] [0x86adc03] [0x804f4d3] [0x873cfaa] [0x8048131] Exiting... </stderr_txt> ]]> Edit: Will think of reinstalling the official RPM of 6.10.58 and see. [Edit 1 times, last edit by Former Member at Feb 28, 2011 11:53:37 AM] |
||
|
|