Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 813 times and has 6 replies Next Thread
abennett
Cruncher
Joined: Dec 11, 2005
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
5.10.22 and Windows 2000 hard hangs

I'm still occasionally having problems on one of my machines that is running BOINC. Every now and then, when I come back to the machine after a period of time, the monitor goes to a sleep state, and the keyboard does not respond. I can only get the machine back by doing a hard restart.

I'm running 5.10.22 on a 2 core AMD X2 3800 that has Windows 2000 Pro on it. No screensaver enabled, but BOINC is using 100% of CPU time. I have 2gb of memory. This machine mostly sits idle, with occasional web browser use. I've never seen it use most of it's physical RAM.

I do not think I have a heat problem:
- this machine previously ran distributed.net's client for months without issues
- temperature checks upon a hang detection are in range

This morning, when looking at the client, I noticed a status of "Computational Error" for a Dengue unit. Poking around, these messages look relevant:

11/3/2007 8:14:07 AM|World Community Grid|Restarting task dddt0201b0112_ZINC05460445-0000_00_0 using dddt version 510
11/3/2007 8:14:16 AM|World Community Grid|Deferring communication for 1 min 0 sec
11/3/2007 8:14:16 AM|World Community Grid|Reason: Unrecoverable error for result dddt0201b0112_ZINC05460445-0000_00_0 ( - exit code -1 (0xffffffff))
11/3/2007 8:14:16 AM|World Community Grid|Computation for task dddt0201b0112_ZINC05460445-0000_00_0 finished

Granted, these messages are from the restart, and not from the incident leading to the hang.

Checking stdoutdae.txt, I have rows of numbers that proceed the status messages from the reboot/restart:

0.
0.001
-0.001
-0.001
-0.002
-0.005
-0.007
-0.011
-0.022
-0.032
-0.042
-0.058
-0.082
-0.11
To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK
2007-11-03 08:13:58 [---] Starting BOINC client version 5.10.22 for windows_intelx86
2007-11-03 08:13:58 [---] log flags: task, file_xfer, sched_ops
2007-11-03 08:13:58 [---] Libraries: libcurl/7.16.1 OpenSSL/0.9.8e zlib/1.2.3
2007-11-03 08:13:58 [---] Data directory: D:\BOINC


Literally, pages and pages of random numbers. stdoutdae.txt is over 1mb in size, and I'd guess that 98% of it is numbers. Where does that 'To pause/resume tasks' line come from? Why is that in a log that I opened in Notepad? Is that the hang symptom?

My issue is very similar to http://www.worldcommunitygrid.org/forums/wcg/...?thread=9381&offset=0 with the dual cores and the computation errors. Although I'm not using the BOINC screen saver.

Help! Babysitting my machine is getting frustrating, not to mention the lost time and potentially lost units.


sad
----------------------------------------
[Edit 2 times, last edit by abennett at Nov 3, 2007 3:59:47 PM]
[Nov 3, 2007 3:56:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5.10.22 and Windows 2000 hard hangs

Computer freezing isn't caused directly by BOINC. However, sometimes the additional stress on your computer can cause such problems. You have ruled out overheating, but faulty or failing power supplies have also been linked to this problem.

Such hangs and resets have probably been the cause of any BOINC errors you see, not the other way around.

I'm not sure what the numbers are doing in your log.
[Nov 3, 2007 4:09:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
abennett
Cruncher
Joined: Dec 11, 2005
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 5.10.22 and Windows 2000 hard hangs

So why does my pc behave as expected using distributed.net's client for their crunching projects, but not BOINC?
[Nov 3, 2007 5:06:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5.10.22 and Windows 2000 hard hangs

This week I've been experimenting with pulling in jobs from different DC projects and can assure you that there are hardware work-outs and hardware work-outs, speak the amount of RAM used, the amount of disk space, disk i/o, and different parts of the CPU itself including the L1/L2 caches causing material temperature rating pressure over the cause of a job and the combination of jobs, at times speeding fans up to 90% of max to stay below CPU desired temp ceiling.

If it was a suspected bad job, check up on the Result Status page, Work Unit quorum detail. If only a single error occurs, it usually is closer to being a client side error instead of a program error.

As for your 'fault', found only one in the whole of the BOINC projects realm with the identical error code and hex . For now, I'd suggest a thorough test with e.g. memtest86 to see if any failures occur and if intermittent or repeating.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 3, 2007 5:19:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Diana G.
Master Cruncher
Joined: Apr 6, 2005
Post Count: 3003
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 5.10.22 and Windows 2000 hard hangs

abennett
<snip>..
Checking stdoutdae.txt, I have rows of numbers that proceed the status messages from the reboot/restart:

0.
0.001
-0.001
-0.001
-0.002
-0.005
-0.007
-0.011
-0.022
-0.032
-0.042
-0.058
-0.082
-0.11
To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK
2007-11-03 08:13:58 [---] Starting BOINC client version 5.10.22 for windows_intelx86
2007-11-03 08:13:58 [---] log flags: task, file_xfer, sched_ops
2007-11-03 08:13:58 [---] Libraries: libcurl/7.16.1 OpenSSL/0.9.8e zlib/1.2.3
2007-11-03 08:13:58 [---] Data directory: D:\BOINC


Literally, pages and pages of random numbers. stdoutdae.txt is over 1mb in size, and I'd guess that 98% of it is numbers. Where does that 'To pause/resume tasks' line come from? Why is that in a log that I opened in Notepad? Is that the hang symptom?

<snip>

sad



Just my thought on this... I think it means that whenever you reboot your pc that always appears. I have seen it exactly like this right after each and everytime I reboot. I don't think it means anything other than that.

smile
----------------------------------------

[Nov 3, 2007 5:21:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: 5.10.22 and Windows 2000 hard hangs

Yes. The stdoutdae file contains everything written to stdout. This includes the messages, of course, but it can also include other things. The pause/resume tasks line is a perfect example of this. It is shown on stdout for the benefit of people running BOINC in a console window.

The numbers, though - I don't recall seeing them before, as I said. If I had to guess, I would say that the science application was writing to stdout instead of stderr, and since stdout isn't redirected, it is going to the parent process stdout and getting logged in the wrong place.

Does anyone want to try to determine which project is doing this? Diana, which project(s) are you running?
[Nov 3, 2007 6:19:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Diana G.
Master Cruncher
Joined: Apr 6, 2005
Post Count: 3003
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 5.10.22 and Windows 2000 hard hangs

Didactylos, I run all the projects but haven't received the AC@H or the HCC yet.
I don't see any numbers in my stdoutdae and it shows that each time I reboot, it writes the pause/resume tasks line. Everything is normal.
.
----------------------------------------

[Nov 4, 2007 3:25:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread