| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Locked Total posts in this thread: 17
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
One of my two computers just crashed, and one of your Human Proteome Folding jobs appears to be responsible.
Its BOINC completed some job or another, started a Rosetta task, and the Rosetta promptly crashed with some sort of "C++ runtime error". That machine's BOINC moved on to a second Rosetta job, and within a few minutes it crashed in the same manner. Then the following all crashed within a few minutes of starting: an FAAH job, another FAAH job, and another HPF job. Then OTHER applications started crashing with the same error message. Just about everything crashed, including Explorer, and Explorer restarted. The computer had become unusable however -- running anything would cause it to immediately crash with the same error message. Not a normal "This program has encountered a problem..." but some sort of C++ runtime error message, with only slight variations. I had a task manager running, and noticed that everything that crashed first leaked -- no, hemmorhaged memory, bloating to over 1GB process size(!) before crashing. The computer was working fine right up until that first Rosetta job crash. It was working fine yesterday at this time, and the day before. I hadn't changed anything. As far as I can tell, the first Rosetta crash started some kind of chain reaction that corrupted the system in some way. In other words, Rosetta has a severe bug, and work units that have some particular X-factor in them can trigger this bug, which will cause Rosetta to crash and corrupt the system. The truly amazing thing is that this isn't some poky Windows ME box. It's running XP Pro, a fully 32 bit protected-mode operating system that is supposed to prevent application faults from corrupting the entire system. I've never seen one crash before other than due to kernel-mode failures, usually in a video driver, when the crash takes the form of a blue screen. This wasn't a BSOD, yet the system did become corrupted, and it started with a Rosetta binary running and then crashing. Please look into this. Anything that can take down a protected-mode operating system without having to run in kernel mode is a serious problem indeed, and might even be abused. There's clearly also a bug in WinXP for this to even be possible. With any luck, MS will fix it soon, since it has security implications (obviously it enables an unprivileged user to launch a denial of service attack on an XP box, and it might enable one to gain privilege depending on the exact nature of the bug). But until then it behooves WCG to try not to crash members' computers; widespread occurrences of what just happened to me will discourage participation, which I doubt you would enjoy. Even if MS fixes *their* bug, Rosetta crashing sometimes will rob users of points, WCG of usable HPF results, and the whole community of CPU cycles that go down the drain. Oh, and the "runtime error" thing steals focus, which is annoying. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It gets worse.
Whatever your crashing HPF job did to that computer, it has *survived a reboot*. The computer, in other words, is going to need a complete reinstall of Windows because of you. I don't know what you did, but the effects are drastic. The computer, after rebooting, was more usable than immediately before rebooting, but after running only a short time one of your HPF tasks crashed again, followed almost immediately by everything else in BOINC's queue on that machine, and other applications started failing to start up. I also saw the excessive memory leak behavior again -- Explorer bloated up suddenly to a 1.2GB process size in a matter of seconds. I rebooted it again and shut down BOINC right after startup. Worked with the machine for no more than ten minutes before Explorer went tits-up with the same error message as before. This time it had suddenly bloated up to 1.3GB. What in Christ's name has your crashing Rosetta job done to my computer?! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Oh, and by the way, after the next reboot the affected machine wouldn't even start up completely. Desktop comes up and a blank taskbar but the tray and Start button are just blue rectangles. I can fire up Task Manager with the three-finger salute and see that nothing much is happening.
Congratulations. You seem to have killed one of my PCs. I hope for your sake that my other computer doesn't catch one of these bad HPF jobs. Perhaps I should preemptively shutdown BOINC on it and quit WCG, just to be safe. I may just do that. Unless, of course, you can convince me that this will not happen again. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Sounds like a major borg event is my first impression. A full reformat and repartitioning seems in order, that is, don't you have a good restore point to go back to. My XP Pro has always done very well in that department.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello twisted0n3,
This is the first crash of this sort ever reported. Just about everything crashed, including Explorer, and Explorer restarted. This sounds like something basic in the system went kablooie. I don't think that anyone will ever be able to definitively pin this on Rosetta. I cannot remember ever seeing a crash report like this over at Rosetta@home either. Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Do you happen to have the name of the WU that you suspect caused the initial crash? Maybe the techs could take a look at it for you.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi twisted0n3.
Have you solved the existing problems with your computer? We can't take your report seriously if it is just a continuation of your existing problems. The last three times you reported a problem, you refused to take our advice. This makes us predisposed to ignore you entirely. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have been ignoring that **** since he posted...
----------------------------------------**Edited for intolerance** TKH [Edit 1 times, last edit by TKH at Jan 30, 2008 1:57:44 PM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Do you happen to have the name of the WU that you suspect caused the initial crash? Maybe the techs could take a look at it for you. hi joneill003, The standard user-side routine is to check up in the WU detail on the Result Status page. If the others are properly returned and sitting in Pending Validation or, better still with HPF2, see that the minimum quorum of 15 was achieved and 'Valid', the probability of it not being a specific host problem is very small. cheers
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
retsof
Former Community Advisor USA Joined: Jul 31, 2005 Post Count: 6824 Status: Offline Project Badges:
|
Is twisted0n3 still playing broken games? He should look there first before blaming it on HPF.
----------------------------------------Google Results 1 - 10 of about 4,680 for twisted0n3. (0.05 seconds)
SUPPORT ADVISOR
Work+GPU i7 8700 12threads School i7 4770 8threads Default+GPU Ryzen 7 3700X 16threads Ryzen 7 3800X 16 threads Ryzen 9 3900X 24threads Home i7 3540M 4threads50% |
||
|
|
|