Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1865 times and has 17 replies Next Thread
jonathandl
Advanced Cruncher
Joined: Nov 12, 2007
Post Count: 106
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
sad BOINC crash

BOINC manager crashed, near the end of the computation for this work unit:

dddt0401o0873_ 100446_ 1--

(Also the next time I opened the BOINC manager the progress of the result in question showed as some funny number around 103% even though the result wasn't quite completed yet.)

This has happened a couple times before (all three incidents happened after the change to the new longer work units), but this most recent time I did not bother to Reset Project immediately after the occurrence. I have an Intel-based mac mini running Mac OS 10.4.11. I have 2GB of total RAM, and 50GB of free disk space. I upgraded to 5.10.45 after the first crash and before the second one. Excerpts from the client_state.xml file and the stderr.gui file are available upon request. Thanks.
----------------------------------------
[Edit 2 times, last edit by jonathandl at Mar 16, 2008 8:02:38 AM]
[Mar 16, 2008 7:55:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC crash

BOINC Manager crashed? The work unit you were running at the time is irrelevant, but thank you for mentioning it.

The log files will be very useful, as well as a description of the crash and error message.

Thank you.
[Mar 16, 2008 8:03:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC crash

Hi Jonathandl,

For a DDDT unit to properly complete, the behaviour observed for present quorum 2 DDDT (on windows) is approximately:

- Stop % progress at about 99.8xx, but continue CPU time counting. The Time To Complete stops at about 45 seconds left.
- After 1 minutes, the % jumps to 100.000, time to complete goes '---'
- After few seconds the % rapidly increases to about 109.xxx, the CPU counter continues with that.
- All counters stop and the Activity column switched from Running to Uploading. The % progress returns to 100.000
- When uploading complete it switches to 'Ready to Report'. That stays till next scheduler contact with server.

This is all part of the wrap up cycle for DDDT which includes local verification that the unit was properly completed. Not having seen a Mac complete, presume this to be very similar on that and any other platform DDDT is distributed to.

If this is what you saw, you're all fine.

ciao
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 16, 2008 12:11:35 PM]
[Mar 16, 2008 9:00:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
jonathandl
Advanced Cruncher
Joined: Nov 12, 2007
Post Count: 106
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BOINC crash

Didactylos wrote:
BOINC Manager crashed? The work unit you were running at the time is irrelevant, but thank you for mentioning it.

Reason I mentioned it is that I was hoping you'd please be so kind as to see if the admin. can flag the work unit for a stricter than usual validation process, or even mark my result as "inconclusive"?

Didactylos wrote:
The log files will be very useful, as well as a description of the crash and error message.

Thank you.

The BOINC manager unexpectedly stops running while I am away from the computer; when I get back, the entire computer, instead of just the video display, is "asleep," and when I "wake" the computer, the BOINC manager window is closed. There is no error message.

Here is an excerpt from stderrgui.txt:
connect: Operation now in progress
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check
SIGABRT: abort called

Crashed executable name: BOINCManager
built using BOINC library version 5.10.45
Machine type Intel 80486
System version: Macintosh OS 10.4.11 build 8S2167
Sat Mar 15 01:11:25 2008
Thread 0 Crashed:
0 0x0009dc57
1 0x0009de45
2 0x0009c51b
3 0x000151f1
4 /usr/lib/libSystem.B.dylib 0x9003d66c _kill + 12
5 0000000000
6 /usr/lib/libSystem.B.dylib 0x9010e8cf _raise + 26
7 /usr/lib/libSystem.B.dylib 0x9010d422 _abort + 77
8 /usr/lib/libstdc++.6.dylib 0x90b4539c __ZN9__gnu_cxx27__verbose_terminate_handlerEv + 492
9 /usr/lib/libstdc++.6.dylib 0x90b43602 ___gxx_personality_v0 + 1130
10 /usr/lib/libstdc++.6.dylib 0x90b43640 __ZSt13set_terminatePFvvE + 0
11 /usr/lib/libstdc++.6.dylib 0x90b43754 ___cxa_rethrow + 0
12 /usr/lib/libstdc++.6.dylib 0x90b05483 __ZSt20__throw_out_of_rangePKc + 187
13 0x00060606
14 0x00060626
15 0x0007210a
16 0x0005f717
17 0x0005fb33
18 0x000bfaa2
19 0x000bfc06
20 0x000bfffe
21 0x001070b5
22 ...mework/Versions/A/HIToolbox 0x92e1fa4a _TimerVector + 31
23 ...k/Versions/A/CoreFoundation 0x9082d76a _CFRunLoopRunSpecific + 3341
24 ...k/Versions/A/CoreFoundation 0x9082ca56 _CFRunLoopRunInMode + 61
25 ...mework/Versions/A/HIToolbox 0x92de7878 _RunCurrentEventLoopInMode + 285
26 ...mework/Versions/A/HIToolbox 0x92de6f82 _ReceiveNextEventCommon + 385
27 ...mework/Versions/A/HIToolbox 0x92efd99c _ReceiveNextEvent + 58
28 0x0014a2af
29 0x001ad223
30 0x001ad084
31 0x000aa378
32 0x0019c0e0
33 0x0000e034
34 0x00002352
35 0x00002279

Thread 0 crashed with X86 Thread State (32-bit):
eax: 0x00000000 ebx: 0x00000000 ecx: 0x00000000 edx: 0x00000000
edi: 0x00000000 esi: 0x00000000 ebp: 0xbfffe568 esp: 0x00000000
ss: 0x00000000 efl: 0x00000000 eip: 0x0009dc57 cs: 0x00000000
ds: 0x00000000 es: 0x00000000 fs: 0x00000000 gs: 0x00000000

Binary Images Description:
0x1000 - 0x3c6fff /Applications/BOINCManager.app/Contents/MacOS/BOINCManager
(rest of "Binary Images Description" omitted for brevity).

Here is an excerpt from the console.log:
Mar 14 23:11:42 jonathans-computer DirectoryService[44]: Failed Authentication return is being delayed due to over five recent auth failures for username: jdl.
Mar 15 01:12:58 jonathans-computer cÑ⁄Gƒ [153]: sleep request recorded: Sat Mar 15 01:12:58 2008\n\n
Mar 15 01:12:59 jonathans-computer cÑ⁄Gƒ [153]: sleep demand recorded: Sat Mar 15 01:12:59 2008\n\n
Note that stderrgui.txt shows the apparent time of crash as 03/15 at 01:11:28, and the sleep event occurred at 01:12:58, so it is extremely unlikely that the sleep event caused the crash. Also, this only started happening with the new larger dddt work units; with the smaller units the display went to sleep and the units continued to crunch in the background, as intended.

The client_state.xml file, as I mentioned, is extremely long; if you need it then I can e-mail it to you. Likewise, the stderrgui.txt file is long, but at least I have an idea of which part seems most relevant, which I posted; if you need the rest then please give me an e-mail address that I can send it to. Thanks!
[Mar 16, 2008 12:32:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC crash

Do you have some stdoutdae.txt log content from around the problem time, say 30 minutes before until after the recorded 'sleep' time.

thanks
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 16, 2008 12:44:27 PM]
[Mar 16, 2008 12:43:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC crash

I have passed on both your reports to the BOINC developers.

This is not related to DDDT. That is one of the few things I am certain of.

Thank you for your reports.
[Mar 16, 2008 12:49:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jonathandl
Advanced Cruncher
Joined: Nov 12, 2007
Post Count: 106
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BOINC crash

Sekerob wrote:
Do you have some stdoutdae.txt log content from around the problem time, say 30 minutes before until after the recorded 'sleep' time.

thanks

Now this is where it gets really strange. It says that I quit the application. Well I can tell you this much: the first two times it happened, I definitely did not quit it. And it's very unlikely that I would have quit the application by mistake three times in three weeks after I have been using BOINC (with small DDDT workunits) for months without ever having quit accidentally. (Also, when I quit normally from the BOINC menu or by pressing apple-Q, it pops up a nice friendly dialog, which of course never happened in association with any of these crashes. And if I quit by shutting down, surely I would notice it when the computer reboots, not to mention the fact the console.log would have filled up with boot-sequence messages.)
14-Mar-2008 22:41:33 [---] Suspending computation - user request
14-Mar-2008 22:41:57 [---] Resuming computation
14-Mar-2008 23:57:01 [---] Suspending network activity - time of day
15-Mar-2008 01:11:27 [---] Exit requested by user
15-Mar-2008 02:36:35 [---] Starting BOINC client version 5.10.45 for i686-apple-darwin
15-Mar-2008 02:36:35 [---] log flags: task, file_xfer, sched_ops
15-Mar-2008 02:36:35 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.7l zlib/1.2.3 c-ares/1.5.1
15-Mar-2008 02:36:35 [---] Data directory: /Library/Application Support/BOINC Data
15-Mar-2008 02:36:35 [---] Processor: 1 GenuineIntel Genuine Intel(R) CPU 200 @ 1.50GHz [x86 Family 6 Model 14 Stepping 8]
15-Mar-2008 02:36:35 [---] Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT CLFSH DS ACPI MMX FXSR SSE SSE2 SS TM SSE3 MON VMX EST TM2 TPR PDCM
15-Mar-2008 02:36:35 [---] OS: Darwin: 8.11.1
15-Mar-2008 02:36:35 [---] Memory: 2.00 GB physical, 50.67 GB virtual
15-Mar-2008 02:36:35 [---] Disk: 55.57 GB total, 50.43 GB free
15-Mar-2008 02:36:35 [---] Local time is UTC -4 hours
15-Mar-2008 02:36:35 [World Community Grid] URL: http://www.worldcommunitygrid.org/; Computer ID: 434452; location: (none); project prefs: default
15-Mar-2008 02:36:35 [---] General prefs: from World Community Grid (last modified 03-Dec-2007 10:18:16)
15-Mar-2008 02:36:35 [---] Host location: none
15-Mar-2008 02:36:35 [---] General prefs: using your defaults
15-Mar-2008 02:36:35 [---] Reading preferences override file
15-Mar-2008 02:36:35 [---] Preferences limit memory usage when active to 1536.00MB
15-Mar-2008 02:36:35 [---] Preferences limit memory usage when idle to 1536.00MB
15-Mar-2008 02:36:35 [---] Preferences limit disk usage to 9.31GB
15-Mar-2008 02:36:35 [---] Suspending network activity - time of day
15-Mar-2008 02:36:36 [World Community Grid] Restarting task dddt0401o0873_100446_1 using dddt version 515
15-Mar-2008 02:39:11 [World Community Grid] Computation for task dddt0401o0873_100446_1 finished

Yes, I did pause and resume about 2 hours 30 minutes before the crash.

Didactylos wrote:
I have passed on both your reports to the BOINC developers.

This is not related to DDDT. That is one of the few things I am certain of.

Thank you for your reports.

I readily agree that it's the BOINC manager and not the DDDT science application that crashed; however, just to be safe, I still strongly suggest that you please subject the Result in question to a more rigorous Validation process, and if the result furnished by the other computer computing the same Work Unit differs in the slightest respect from mine, then mark the other Result as "canonical," or mark mine as "inconclusive."
[Mar 16, 2008 2:17:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC crash

Jonathan, if either result differs by the tiniest amount, they are both marked "inconclusive" until one of them is matched by an additional copy of the work unit. World Community Grid err on the side of caution.

If you look at the crash data, you will see that it terminated with SIGABRT. Evidently this is interpreted as a normal, user shutdown. The message is misleading; just ignore it.
[Mar 16, 2008 2:33:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: BOINC crash

I'm obscured. OS X 10.4.11 is reported in OP, yet the log speaks of MAC OS: Darwin 8.11.1. Is that the same thing?

[edit: One INTEL and the other PowerPC processor based ? http://en.wikipedia.org/wiki/Darwin_(operating_system)]
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 16, 2008 3:07:13 PM]
[Mar 16, 2008 2:45:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jonathandl
Advanced Cruncher
Joined: Nov 12, 2007
Post Count: 106
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BOINC crash

Didactylos wrote:
Jonathan, if either result differs by the tiniest amount, they are both marked "inconclusive" until one of them is matched by an additional copy of the work unit. World Community Grid err on the side of caution.

Thanks.
By the way, it crashed again on 03/16/2008 at 19:09:09 EDT. Would you like logs?
[Mar 17, 2008 5:55:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread