| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 18
|
|
| Author |
|
|
jonathandl
Advanced Cruncher Joined: Nov 12, 2007 Post Count: 106 Status: Offline Project Badges:
|
BOINC manager crashed, near the end of the computation for this work unit:
----------------------------------------dddt0401o0873_ 100446_ 1-- (Also the next time I opened the BOINC manager the progress of the result in question showed as some funny number around 103% even though the result wasn't quite completed yet.) This has happened a couple times before (all three incidents happened after the change to the new longer work units), but this most recent time I did not bother to Reset Project immediately after the occurrence. I have an Intel-based mac mini running Mac OS 10.4.11. I have 2GB of total RAM, and 50GB of free disk space. I upgraded to 5.10.45 after the first crash and before the second one. Excerpts from the client_state.xml file and the stderr.gui file are available upon request. Thanks. [Edit 2 times, last edit by jonathandl at Mar 16, 2008 8:02:38 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
BOINC Manager crashed? The work unit you were running at the time is irrelevant, but thank you for mentioning it.
The log files will be very useful, as well as a description of the crash and error message. Thank you. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Hi Jonathandl,
----------------------------------------For a DDDT unit to properly complete, the behaviour observed for present quorum 2 DDDT (on windows) is approximately: - Stop % progress at about 99.8xx, but continue CPU time counting. The Time To Complete stops at about 45 seconds left. - After 1 minutes, the % jumps to 100.000, time to complete goes '---' - After few seconds the % rapidly increases to about 109.xxx, the CPU counter continues with that. - All counters stop and the Activity column switched from Running to Uploading. The % progress returns to 100.000 - When uploading complete it switches to 'Ready to Report'. That stays till next scheduler contact with server. This is all part of the wrap up cycle for DDDT which includes local verification that the unit was properly completed. Not having seen a Mac complete, presume this to be very similar on that and any other platform DDDT is distributed to. If this is what you saw, you're all fine. ciao
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 16, 2008 12:11:35 PM] |
||
|
|
jonathandl
Advanced Cruncher Joined: Nov 12, 2007 Post Count: 106 Status: Offline Project Badges:
|
Didactylos wrote:
BOINC Manager crashed? The work unit you were running at the time is irrelevant, but thank you for mentioning it. Reason I mentioned it is that I was hoping you'd please be so kind as to see if the admin. can flag the work unit for a stricter than usual validation process, or even mark my result as "inconclusive"? Didactylos wrote: The log files will be very useful, as well as a description of the crash and error message. Thank you. The BOINC manager unexpectedly stops running while I am away from the computer; when I get back, the entire computer, instead of just the video display, is "asleep," and when I "wake" the computer, the BOINC manager window is closed. There is no error message. Here is an excerpt from stderrgui.txt: connect: Operation now in progress(rest of "Binary Images Description" omitted for brevity). Here is an excerpt from the console.log: Mar 14 23:11:42 jonathans-computer DirectoryService[44]: Failed Authentication return is being delayed due to over five recent auth failures for username: jdl.Note that stderrgui.txt shows the apparent time of crash as 03/15 at 01:11:28, and the sleep event occurred at 01:12:58, so it is extremely unlikely that the sleep event caused the crash. Also, this only started happening with the new larger dddt work units; with the smaller units the display went to sleep and the units continued to crunch in the background, as intended. The client_state.xml file, as I mentioned, is extremely long; if you need it then I can e-mail it to you. Likewise, the stderrgui.txt file is long, but at least I have an idea of which part seems most relevant, which I posted; if you need the rest then please give me an e-mail address that I can send it to. Thanks! |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Do you have some stdoutdae.txt log content from around the problem time, say 30 minutes before until after the recorded 'sleep' time.
----------------------------------------thanks
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 16, 2008 12:44:27 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have passed on both your reports to the BOINC developers.
This is not related to DDDT. That is one of the few things I am certain of. Thank you for your reports. |
||
|
|
jonathandl
Advanced Cruncher Joined: Nov 12, 2007 Post Count: 106 Status: Offline Project Badges:
|
Sekerob wrote:
Do you have some stdoutdae.txt log content from around the problem time, say 30 minutes before until after the recorded 'sleep' time. thanks Now this is where it gets really strange. It says that I quit the application. Well I can tell you this much: the first two times it happened, I definitely did not quit it. And it's very unlikely that I would have quit the application by mistake three times in three weeks after I have been using BOINC (with small DDDT workunits) for months without ever having quit accidentally. (Also, when I quit normally from the BOINC menu or by pressing apple-Q, it pops up a nice friendly dialog, which of course never happened in association with any of these crashes. And if I quit by shutting down, surely I would notice it when the computer reboots, not to mention the fact the console.log would have filled up with boot-sequence messages.) 14-Mar-2008 22:41:33 [---] Suspending computation - user request Yes, I did pause and resume about 2 hours 30 minutes before the crash. Didactylos wrote: I have passed on both your reports to the BOINC developers. This is not related to DDDT. That is one of the few things I am certain of. Thank you for your reports. I readily agree that it's the BOINC manager and not the DDDT science application that crashed; however, just to be safe, I still strongly suggest that you please subject the Result in question to a more rigorous Validation process, and if the result furnished by the other computer computing the same Work Unit differs in the slightest respect from mine, then mark the other Result as "canonical," or mark mine as "inconclusive." |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Jonathan, if either result differs by the tiniest amount, they are both marked "inconclusive" until one of them is matched by an additional copy of the work unit. World Community Grid err on the side of caution.
If you look at the crash data, you will see that it terminated with SIGABRT. Evidently this is interpreted as a normal, user shutdown. The message is misleading; just ignore it. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
I'm obscured. OS X 10.4.11 is reported in OP, yet the log speaks of MAC OS: Darwin 8.11.1. Is that the same thing?
----------------------------------------[edit: One INTEL and the other PowerPC processor based ? http://en.wikipedia.org/wiki/Darwin_(operating_system)]
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Mar 16, 2008 3:07:13 PM] |
||
|
|
jonathandl
Advanced Cruncher Joined: Nov 12, 2007 Post Count: 106 Status: Offline Project Badges:
|
Didactylos wrote:
Jonathan, if either result differs by the tiniest amount, they are both marked "inconclusive" until one of them is matched by an additional copy of the work unit. World Community Grid err on the side of caution. Thanks. By the way, it crashed again on 03/16/2008 at 19:09:09 EDT. Would you like logs? |
||
|
|
|